Const correctness of pointers against safe checks during initialization - c++

Consider a class that allocates a block of memory to use:
typedef unsigned char byte;
class ByteBuffer
{
byte* const m_begin; // const pointer, non-const data
byte* const m_end; // const pointer, non-const data
byte* m_pData; // a non-const pointer, e.g. write-head
public:
ByteBuffer(size_t byteBufferSize)
: m_begin(new byte[byteBufferSize],
m_end(m_begin + byteBufferSize),
m_pData(m_begin)
{}
~ByteBuffer() { delete[] m_begin; }
};
If I want the code maintained, it's nice to ensure that m_begin and m_end are const so as to ensure any math operations regarding the size of the container will be correct (and that someone else coming in won't accidentally update the edges. But the drawback is that I can no longer initialize the data anywhere outside of the constructor's initialization list.
Rare as std::bad_alloc may be, I'm not convinced that calling new[] like I did above is a good idea. So my question is: how would we deal with wanting const pointers to memory to be managed in a class?
I would want the constructor to do something akin to this:
: m_begin(nullptr), m_end(nullptr), m_pData(nullptr)
{
// initialize here
// if initialization is successful, set const & non-const ptrs
}
Am I overthinking this? Does making m_begin a const unique_ptr make all the initialization woes go away somehow?

The above code is essentially the correct way to write it. However, you could consider using std::vector instead as you won't have to explicitly manage memory.
If fact, std::vector already define begin() and end() (which would replace m_pData in your code) and you won't really need m_end replacement as you don't manage the memory yourself.
The equivalent would be begin() + capacity() but as element are not initialized after end you should not access those anyway (probably undefined behavior from language point of view).
Otherwise small improvements are possible like:
Avoid m_ for member variable name. Not much useful.
Avoid p prefix for pointer to data.
In the initialization list, put comma on the beginning of a line (as you have done for :) instead of at the end of previous line. It make it easier to add, remove, move or comment out items when done that way.
If you don't want your object to be moveable or copyable, it is preferable to explicitly specify it:
public:
ByteBuffer(const ByteBuffer &) = delete;
ByteBuffer(ByteBuffer &&) = delete;
ByteBuffer &operator=(const ByteBuffer &) = delete;
ByteBuffer &operator=(ByteBuffer &&) = delete;
Alternatively, you could also use std::unique_ptr and then you won't have to free memory yourself. This could be useful when you class need to make multiple allocations or when an exception could be thrown after the allocation in the constructor.
The best solution really depends on what you want to do with your ByteBuffer class or even if you really need such a class (as you might use std::vector directly instead or doing a simple alias like:
using ByteBuffer = std::vector<unsigned char>;
If fact, the byte typedef might not even be that much useful if you use auto for your loops and iterators.
Speaking of const, I would generally recommend to have it to prevent undesired changes. However, sometime you need to reset those pointers to nullptr during destruction of somewhat complex classes where there could be some mutual destruction for example.
So by having simple classes that respect the SRP (single responsabilty principle), you essentially avoid that kind of problems.

Related

Implementing unique_ptr: Deleting non-allocated objects

My goal here is to implement a simple version of unique_ptr which offers only a constructor, destructor, ->, *, and release().
However, I don't know what to do in the case where a unique_ptr is initialized using a non-allocated pointer.
eg
int i = 0;
unique_ptr<int> p{&i};
If the unique_ptr simply calls delete on it owned pointer, this will produced undefined (and undesirable) behavior, at least as far as I know. What can I do to prevent this?
EDIT: My attempt at the problem is as follows...
template<typename T>
class Uptr
{
public:
Uptr<T>(T* pt) : mp_owned{pt} {}
~Uptr<T>() {delete mp_owned;}
Uptr<T>(const Uptr<T>&) = delete;
Uptr<T>& operator=(const Uptr<T>&) = delete;
T& operator*() const {return *mp_owned;}
T* operator->() const {return mp_owned;}
T* release() {return mp_owned;}
private:
T* mp_owned;
};
You cannot check programmatically how a pointer value was obtained. In your situation (which is highly representative of large parts of real programming!), the solution is to document your interface to require that the pointer value be deletable. You place a precondition on your users, which requires that your users read the documentation and follow it, and you do not and cannot provide in-language methods for validating those preconditions. You pass the burden on to your users.
Such precondition burdens always form a kind of "technical debt", and you want to avoid it as much as you can (but perhaps not at the expense of runtime cost). For example, in the standard library we would strongly discourage the use of the ownership-taking constructor of unique_ptr and instead as the user to use make_unique, which has no preconditions and results in a valid unique_ptr value. That design is exemplary for how you would manage technical debt in the real world.
Unfortunately, there is nothing you can do about this: the ownership of the object must be transferred to unique_ptr<T>, which is not possible if you use addresses of objects in global, static, or automatic areas.
The implementation from the standard library, i.e. std::unique_ptr<T,Deleter>, takes a deleter parameter, which you can use to not delete anything. However, this use is highly questionable, because in this situation you do not need unique_ptr at all.

unique_ptr<int[]> or vector<int>?

If you don't need dynamic growth and don't know the size of the buffer at compile time, when should unique_ptr<int[]> be used instead of vector<int> if at all?
Is there a significant performance loss in using vector instead of unique_ptr?
There is no performance loss in using std::vector vs. std::unique_ptr<int[]>. The alternatives are not exactly equivalent though, since the vector could be grown and the pointer cannot (this can be and advantage or a disadvantage, did the vector grow by mistake?)
There are other differences, like the fact that the values will be initialized in the std::vector, but they won't be if you new the array (unless you use value-initialization...).
At the end of the day, I personally would opt for std::vector<>, but I still code in C++03 without std::unique_ptr.
If you're in a position where vector<int> is even a possibility, you probably want to go with that except in extreme and rare circumstances. And even then, a custom type instead of unique_ptr<int[]> may well be the best answer.
So what the heck is unique_ptr<int[]> good for? :-)
unique_ptr<T[]> really shines in two circumstances:
1. You need to handle a malloc/free resource from some legacy function and you would like to do it in a modern exception safe style:
void
foo()
{
std::unique_ptr<char[], void(*)(void*)> p(strdup("some text"), std::free);
for (unsigned i = 0; p[i]; ++i)
std::cout << p[i];
std::cout << '\n';
}
2. You've need to temporarily secure a new[] resource before transferring it onto another owner:
class X
{
int* data_;
std::string name_;
static void validate(const std::string& nm);
public:
~X() {delete [] data_;}
X(int* data, const std::string& name_of_data)
: data_(nullptr),
name_()
{
std::unique_ptr<int[]> hold(data); // noexcept
name_ = name_of_data; // might throw
validate(name_); // might throw
data_ = hold.release(); // noexcept
}
};
In the above scenario, X owns the pointer passed to it, whether or not the constructor succeeds. This particular example assumes a noexcept default constructor for std::string which is not mandated. However:
This point is generalizable to circumstances not involving std::string.
A std::string default constructor that throws is lame.
std::vector stores the length of both the size of the variable and the size of the allocated data along with the pointer to the data it's self. std::unique_ptr just stores the pointer so there may be a small gain in using std::unique_ptr.
No one has yet mentioned the vector provides iterators and function such and size() where as unique ptr does not. So if iterators are needed use std::vector
C++14 introduces std::dynarray for that purpose.
Now, between these two constructions :
auto buffer = std::make_unique<int[]>( someCount );
auto buffer = std::vector<int>( someCount, someValue );
The first gives you an uninitialized array of int but the second initializes it with a value ( 0 if not provide ). So if you do not need the memory to be initialized because you will overwrite it somehow later with something more complex than std::fill, choose 1, if not, choose 2.
Objective Part:
No, there probably shouldn't be a significant performance difference between the two (though I suppose it depends on the implementation and you should measure if it's critical).
Subjective Part:
std::vector is going to give you a well known interface with .size() and .at() and iterators, which will play nicely with all sorts of other code. Using std::unique_ptr gives you a more primitive interface and makes you keep track of details (like the size) separately. Therefore, barring other constraints, I would prefer std::vector.

Move semantics with a pointer to an internal buffer

Suppose I have a class which manages a pointer to an internal buffer:
class Foo
{
public:
Foo();
...
private:
std::vector<unsigned char> m_buffer;
unsigned char* m_pointer;
};
Foo::Foo()
{
m_buffer.resize(100);
m_pointer = &m_buffer[0];
}
Now, suppose I also have correctly implemented rule-of-3 stuff including a copy constructor which copies the internal buffer, and then reassigns the pointer to the new copy of the internal buffer:
Foo::Foo(const Foo& f)
{
m_buffer = f.m_buffer;
m_pointer = &m_buffer[0];
}
If I also implement move semantics, is it safe to just copy the pointer and move the buffer?
Foo::Foo(Foo&& f) : m_buffer(std::move(f.m_buffer)), m_pointer(f.m_pointer)
{ }
In practice, I know this should work, because the std::vector move constructor is just moving the internal pointer - it's not actually reallocating anything so m_pointer still points to a valid address. However, I'm not sure if the standard guarantees this behavior. Does std::vector move semantics guarantee that no reallocation will occur, and thus all pointers/iterators to the vector are valid?
I'd do &m_buffer[0] again, simply so that you don't have to ask these questions. It's clearly not obviously intuitive, so don't do it. And, in doing so, you have nothing to lose whatsoever. Win-win.
Foo::Foo(Foo&& f)
: m_buffer(std::move(f.m_buffer))
, m_pointer(&m_buffer[0])
{}
I'm comfortable with it mostly because m_pointer is a view into the member m_buffer, rather than strictly a member in its own right.
Which does all sort of beg the question... why is it there? Can't you expose a member function to give you &m_buffer[0]?
I'll not comment the OP's code. All I'm doing is aswering this question:
Does std::vector move semantics guarantee that no reallocation will occur, and thus all pointers/iterators to the vector are valid?
Yes for the move constructor. It has constant complexity (as specified by 23.2.1/4, table 96 and note B) and for this reason the implementation has no choice other than stealing the memory from the original vector (so no memory reallocation occurs) and emptying the original vector.
No for the move assignment operator. The standard requires only linear complexity (as specified in the same paragraph and table mentioned above) because sometimes a reallocation is required. However, in some cirsunstances, it might have constant complexity (and no reallocation is performed) but it depends on the allocator. (You can read the excelent exposition on moved vectors by Howard Hinnant here.)
A better way to do this may be:
class Foo
{
std::vector<unsigned char> m_buffer;
size_t m_index;
unsigned char* get_pointer() { return &m_buffer[m_index];
};
ie rather than store a pointer to a vector element, store the index of it. That way it will be immune to copying/resizing of the vectors backing store.
The case of move construction is guaranteed to move the buffer from one container to the other, so from the point of view of the newly created object, the operation is fine.
On the other hand, you should be careful with this kind of code, as the donor object is left with a empty vector and a pointer referring to the vector in a different object. This means that after being moved from your object is in a fragile state that might cause issues if anyone accesses the interface and even more importantly if the destructor tries to use the pointer.
While in general there won't be any use of your object after being moved from (the assumption being that to be bound by an rvalue-reference it must be an rvalue), the fact is that you can move out of an lvalue by casting or by using std::move (which is basically a cast), in which case code might actually attempt to use your object.

A vector in a struct - best approach? C++

I am trying to include a vector in my struct.
Here is my struct:
struct Region
{
bool hasPoly;
long size1;
long size2;
long size3;
long size4;
long size5;
long size6;
//Mesh* meshRef; // the mesh with the polygons for this region
long meshRef;
std::vector<int> PVS;
} typedef Region;
Is the vector in this declaration valid or would it make more sense to do a pointer to a vector. In the case of a pointer to a vector, do I need to allocate a new vector. How would I accomplish this?
Thanks!
Edit: The problem is that it ends up causing an error that points to xmemory.h, a file included with the MSVC++ platform.
void construct(pointer _Ptr, _Ty&& _Val)
{ // construct object at _Ptr with value _Val
::new ((void _FARQ *)_Ptr) _Ty(_STD forward<_Ty>(_Val)); // this is the line
}
Interestingly, it does not happen if I allocate it outside of the struct and simply in the function I use. Any ideas?
You can write it like this without the typedef:
struct Region
{
bool hasPoly;
long size1;
long size2;
long size3;
long size4;
long size5;
long size6;
long meshRef;
std::vector<int> PVS;
}; // no typedef required
To answer your questions:
Is the vector in this declaration valid
Yes, it is.
or would it make more sense to do a pointer to a vector.
No, probably not. If you did then you would have to implement copy constructor, assignment operator and destructor for the copy behavior. You would end up with the same but it would be extra work and potentially introduce bugs.
In the case of a pointer to a vector, do I need to allocate a new vector. How would I accomplish this?
You would need to implement the copy constructor, the copy assignment operator and the destructor:
// Copy constructor
Region(const Region & rhs) :
hasPoly(rhs.hasPoly),
// ... copy other members just like hasPoly above, except for PVS below:
PVS(new std::vector<int>(*rhs.PVS))
{
}
// Copy assignment operator
Region & operator=(const Region & rhs)
{
if (this != &rhs)
{
hasPoly = rhs.hasPoly;
// ... copy all fields like hasPoly above, except for PVS below:
delete PVS;
PVS = new std::vector<int>(*rhs.PVS);
}
return *this;
}
// Destructor
Region::~Region()
{
delete PVS;
}
Bottom line: your code is fine. You don't need to change it.
EDIT: Fix assignment operator: check for comparison against this and return *this.
It makes complete sense to do that and you don't need new in any respect, unless you actually want to alias a separate vector. In addition, you don't need any typedef stuff going on here.
It depends on how you use it.
If you want to copy the vector and data when copying the Region struct, then leave it as a non-pointer.
If you don't want it copied over, then you will want some sort of pointer to a vector.
If you use a pointer to a vector, you should be very careful about allocation/deallocation exception safety. If you can't scope your allocation in an exception safe way, then you'll leave a potential for memory leaks.
A couple options are:
Make sure that the code that allocates the vector (and uses the Region) also deallocates the vector, and is itself exception safe. This would require the Region to only exist inside that code's scope.
You could do this by simply allocating the vector on the stack, and pass that to the pointer in the Region. Then make sure you never return a Region object above that stack frame.
You could also use some sort of smart pointer -> vector in your Region.
The vector is fine. Be aware that if you copy this struct, then the vector will be copied with it. So in code with particular performance constraints, treat this struct the same way that you'd treat any other expensive-to-copy type.
In production code, some people would prefer you to use the class keyword rather than the struct keyword to define this class, since the vector member makes it non-POD. If you're the author of your own style guide there's nothing to worry about.
The typedef is wrong, though, just write struct Region { stuff };

Prototype for function that allocates memory on the heap (C/C++)

I'm fairly new to C++ so this is probably somewhat of a beginner question. It regards the "proper" style for doing something I suspect to be rather common.
I'm writing a function that, in performing its duties, allocates memory on the heap for use by the caller. I'm curious about what a good prototype for this function should look like. Right now I've got:
int f(char** buffer);
To use it, I would write:
char* data;
int data_length = f(&data);
// ...
delete[] data;
However, the fact that I'm passing a pointer to a pointer tips me off that I'm probably doing this the wrong way.
Anyone care to enlighten me?
In C, that would have been more or less legal.
In C++, functions typically shouldn't do that. You should try to use RAII to guarantee memory doesn't get leaked.
And now you might say "how would it leak memory, I call delete[] just there!", but what if an exception is thrown at the // ... lines?
Depending on what exactly the functions are meant to do, you have several options to consider. One obvious one is to replace the array with a vector:
std::vector<char> f();
std::vector<char> data = f();
int data_length = data.size();
// ...
//delete[] data;
and now we no longer need to explicitly delete, because the vector is allocated on the stack, and its destructor is called when it goes out of scope.
I should mention, in response to comments, that the above implies a copy of the vector, which could potentially be expensive. Most compilers will, if the f function is not too complex, optimize that copy away so this will be fine. (and if the function isn't called too often, the overhead won't matter anyway). But if that doesn't happen, you could instead pass an empty array to the f function by reference, and have f store its data in that instead of returning a new vector.
If the performance of returning a copy is unacceptable, another alternative would be to decouple the choice of container entirely, and use iterators instead:
// definition of f
template <typename iter>
void f(iter out);
// use of f
std::vector<char> vec;
f(std::back_inserter(vec));
Now the usual iterator operations can be used (*out to reference or write to the current element, and ++out to move the iterator forward to the next element) -- and more importantly, all the standard algorithms will now work. You could use std::copy to copy the data to the iterator, for example. This is the approach usually chosen by the standard library (ie. it is a good idea;)) when a function has to return a sequence of data.
Another option would be to make your own object taking responsibility for the allocation/deallocation:
struct f { // simplified for the sake of example. In the real world, it should be given a proper copy constructor + assignment operator, or they should be made inaccessible to avoid copying the object
f(){
// do whatever the f function was originally meant to do here
size = ???
data = new char[size];
}
~f() { delete[] data; }
int size;
char* data;
};
f data;
int data_length = data.size;
// ...
//delete[] data;
And again we no longer need to explicitly delete because the allocation is managed by an object on the stack. The latter is obviously more work, and there's more room for errors, so if the standard vector class (or other standard library components) do the job, prefer them. This example is only if you need something customized to your situation.
The general rule of thumb in C++ is that "if you're writing a delete or delete[] outside a RAII object, you're doing it wrong. If you're writing a new or `new[] outside a RAII object, you're doing it wrong, unless the result is immediately passed to a smart pointer"
In 'proper' C++ you would return an object that contains the memory allocation somewhere inside of it. Something like a std::vector.
Your function should not return a naked pointer to some memory. The pointer, after all, can be copied. Then you have the ownership problem: Who actually owns the memory and should delete it? You also have the problem that a naked pointer might point to a single object on the stack, on the heap, or to a static object. It could also point to an array at these places. Given that all you return is a pointer, how are users supposed to know?
What you should do instead is to return an object that manages its resource in an appropriate way. (Look up RAII.) Give the fact that the resource in this case is an array of char, either a std::string or a std::vector seem to be best:
int f(std::vector<char>& buffer);
std::vector<char> buffer;
int result = f(buffer);
Why not do the same way as malloc() - void* malloc( size_t numberOfBytes )? This way the number of bytes is the input parameter and the allocated block address is the return value.
UPD:
In comments you say that f() basically performs some action besides allocating memory. In this case using std::vector is a much better way.
void f( std::vector<char>& buffer )
{
buffer.clear();
// generate data and add it to the vector
}
the caller will just pass an allocated vector:
std::vector buffer;
f( buffer );
//f.size() now will return the number of elements to work with
Pass the pointer by reference...
int f(char* &buffer)
However you may wish to consider using reference counted pointers such as boost::shared_array to manage the memory if you are just starting this out.
e.g.
int f(boost::shared_array<char> &buffer)
Use RAII (Resource Acquisition Is Initialization) design pattern.
http://en.wikipedia.org/wiki/RAII
Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)
Just return the pointer:
char * f() {
return new char[100];
}
Having said that, you probably do not need to mess with explicit allocation like this - instead of arrays of char, use std::string or std::vector<char> instead.
If all f() does with the buffer is to return it (and its length), let it just return the length, and have the caller new it. If f() also does something with the buffer, then do as polyglot suggeted.
Of course, there may be a better design for the problem you want to solve, but for us to suggest anything would require that you provide more context.
The proper style is probably not to use a char* but a std::vector or a std::string depending on what you are using char* for.
About the problem of passing a parameter to be modified, instead of passing a pointer, pass a reference. In your case:
int f(char*&);
and if you follow the first advice:
int f(std::string&);
or
int f(std::vector<char>&);
Actually, the smart thing to do would be to put that pointer in a class. That way you have better control over its destruction, and the interface is much less confusing to the user.
class Cookie {
public:
Cookie () : pointer (new char[100]) {};
~Cookie () {
delete[] pointer;
}
private:
char * pointer;
// Prevent copying. Otherwise we have to make these "smart" to prevent
// destruction issues.
Cookie(const Cookie&);
Cookie& operator=(const Cookie&);
};
Provided that f does a new[] to match, it will work, but it's not very idiomatic.
Assuming that f fills in the data and is not just a malloc()-alike you would be better wrapping the allocation up as a std::vector<char>
void f(std::vector<char> &buffer)
{
// compute length
int len = ...
std::vector<char> data(len);
// fill in data
...
buffer.swap(data);
}
EDIT -- remove the spurious * from the signature
I guess you are trying to allocate a one dimensional array. If so, you don't need to pass a pointer to pointer.
int f(char* &buffer)
should be sufficient. And the usage scenario would be:
char* data;
int data_length = f(data);
// ...
delete[] data;