Is reusing a memory location safe? - c++

This question is based on some existing C code ported to C++. I am just interested in whether it is "safe". I already know I wouldn't have written it like this. I am aware that the code here is basically C rather than C++ but it's compiled with a C++ compiler and I know that the standards are slightly different sometimes.
I have a function that allocates some memory. I cast the returned void* to an int* and start using it.
Later on I cast the returned void* to a Data* and start using that.
Is this safe in C++?
Example :-
void* data = malloc(10000);
int* data_i = (int*)data;
*data_i = 123;
printf("%d\n", *data_i);
Data* data_d = (Data*)data;
data_d->value = 456;
printf("%d\n", data_d->value);
I never read variables used via a different type than they were stored but worry that the compiler might see that data_i and data_d are different types and so cannot legally alias each other and decide to reorder my code, for example putting the store to data_d before the first printf. Which would break everything.
However this is a pattern that is used all the time. If you insert a free and malloc in between the two accesses I don't believe it alters anything as it doesn't touch the affected memory itself and can reuse the same data.
Is my code broken or is it "correct"?

It's "OK", it works as you have written it (assuming primitives and plain-old-datatypes (PODs)). It is safe. It is effectively a custom memory manager.
Some notes:
If objects with non-trivial destructors are created in the location of the allocated memory, make sure it is called
obj->~obj();
If creating objects, consider the placement new syntax over a plain cast (works with PODs as well)
Object* obj = new (data) Object();
Check for a nullptr (or NULL), if malloc fails, NULL is returned
Alignment shouldn't a problem, but always be aware of it when creating a memory manager and make sure that the alignment is appropriate
Given you are using a C++ compiler, unless you want to keep the "C" nature to the code you can also look to the global operator new().
And as always, once done don't forget the free() (or delete if using new)
You mention that you are not going to convert any of the code just yet; but if or when you do consider it, there are a few idiomatic features in C++ you may wish to use over the malloc or even the global ::operator new.
You should look to the smart pointer std::unique_ptr<> or std::shared_ptr<> and allow them to take care of the memory management issues.

Depending on the definition of Data, your code might be broken. It's bad code, either way.
If Data is a plain old data type (POD, i.e. a typedef for a basic type, a struct of POD types etc.), and the allocated memory is properly aligned for the type (*), then your code is well-defined, which means it will "work" (as long as you initialize each member of *data_d before using it), but it is not good practice. (See below.)
If Data is a non-POD type, you are heading for trouble: The pointer assignment would not have invoked any constructors, for example. data_d, which is of type "pointer to Data", would effectively be lying because it points at something, but that something is not of type Data because no such type has been created / constructed / initialized. Undefined behaviour will be not far off at that point.
The solution for properly constructing an object at a given memory location is called placement new:
Data * data_d = new (data) Data();
This instructs the compiler to construct a Data object at the location data. This will work for POD and non-POD types alike. You will also need to call the destructor (data_d->~Data()) to make sure it is run before deleteing the memory.
Take good care to never mix the allocation / release functions. Whatever you malloc() needs to be free()d, what is allocated with new needs delete, and if you new [] you have to delete []. Any other combination is UB.
In any case, using "naked" pointers for memory ownership is discouraged in C++. You should either
put new in a constructor and the corresponding delete in the destructor of a class, making the object the owner of the memory (including proper deallocation when the object goes out of scope, e.g. in the case of an exception); or
use a smart pointer which effectively does the above for you.
(*): Implementations are known to define "extended" types, the alignment requirements of which are not taken into account by malloc(). I'm not sure if language lawyers would still call them "POD", actually. MSVC, for example, does 8-byte alignment on malloc() but defines the SSE extended type __m128 as having a 16-byte alignment requirement.

The rules surrounding strict aliasing can be quite tricky.
An example of strict aliasing is:
int a = 0;
float* f = reinterpret_cast<float*>(&a);
f = 0.3;
printf("%d", a);
This is a strict aliasing violation because:
the lifetime of the variables (and their use) overlap
they are interpreting the same piece of memory through two different "lenses"
If you are not doing both at the same time, then your code does not violate strict aliasing.
In C++, the lifetime of an object starts when the constructor ends and stops when the destructor starts.
In the case of built-in types (no destructor) or PODs (trivial destructor), the rule is instead that their lifetime ends whenever the memory is either overwritten or freed.
Note: this is specifically to support writing memory managers; after all malloc is written in C and operator new is written in C++ and they are explicitly allowed to pool memory.
I specifically used lenses instead of types because the rule is a bit more difficult.
C++ generally use nominal typing: if two types have a different name, they are different. If you access a value of dynamic type T as if it were a U, then you are violating aliasing.
There are a number of exceptions to this rule:
access by base class
in PODs, access as a pointer to the first attribute
And the most complicated rule is related to union where C++ shifts to structural typing: you can access a piece of memory through two different types, if you only access parts at the beginning of this piece of memory in which the two types share a common initial sequence.
§9.2/18 If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
Given:
struct A { int a; };
struct B: A { char c; double d; };
struct C { int a; char c; char* z; };
Within a union X { B b; C c; }; you can access x.b.a, x.b.c and x.c.a, x.c.c at the same time; however accessing x.b.d (respectively x.c.z) is a violation of aliasing if the currently stored type is not B (respectively not C).
Note: informally, structural typing is like mapping down the type to a tuple of its fields (flattening them).
Note: char* is specifically exempt from this rule, you can view any piece of memory through char*.
In your case, without the definition of Data I cannot say whether the "lenses" rule could be violated, however since you are:
overwriting memory with Data before accessing it through Data*
not accessing it through int* afterwards
then you are compliant with the lifetime rule, and thus there is no aliasing taking place as far as the language is concerned.

As long as the memory is used for only one thing at a time it's safe. You're basically use the allocated data as a union.
If you want to use the memory for instances of classes and not only simple C-style structures or data-types, you have to remember to do placement new to "allocate" the objects, as this will actually call the constructor of the object. The destructor you have to call explicitly when you're done with the object, you can't delete it.

As long as you only handle "C"-types, this would be ok. But as soon as you use C++ classes you will get into trouble with proper initialization. If we assume that Data would be std::string for example, the code would be very wrong.
The compiler cannot really move the store across the call to printf, because that is a visible side effect. The result has to be as if the side effects are produced in the order the program prescribes.

Effectively, you've implemented your own allocator on top of malloc/free that reuses a block in this case. That's perfectly safe. Allocator wrappers can certainly reuse blocks so long as the block is big enough and comes from a source that guarantees sufficient alignment (and malloc does).

As long a Data remains a POD this should be fine. Otherwise you would have to switch to placement new.
I would however put a static assert in place so that this doesn't change during later refactoring

I don't find any mistake in reusing the memory space. Only what I care for is the dangling reference. Reusing memory space as you have said I think it doesn't have any effect on the program.
You can go on with your programming. But it is always preferable to free() the space and then allocate to another variable.

Related

Program with realloc() works just fine but at the end it returns error -1073741819. Can someone please find my mistake in this code? [duplicate]

From what is written here, new allocates in free store while malloc uses heap and the two terms often mean the same thing.
From what is written here, realloc may move the memory block to a new location. If free store and heap are two different memory spaces, does it mean any problem then?
Specifically I'd like to know if it is safe to use
int* data = new int[3];
// ...
int* mydata = (int*)realloc(data,6*sizeof(int));
If not, is there any other way to realloc memory allocated with new safely? I could allocate new area and memcpy the contents, but from what I understand realloc may use the same area if possible.
You can only realloc that which has been allocated via malloc (or family, like calloc).
That's because the underlying data structures that keep track of free and used areas of memory, can be quite different.
It's likely but by no means guaranteed that C++ new and C malloc use the same underlying allocator, in which case realloc could work for both. But formally that's in UB-land. And in practice it's just needlessly risky.
C++ does not offer functionality corresponding to realloc.
The closest is the automatic reallocation of (the internal buffers of) containers like std::vector.
The C++ containers suffer from being designed in a way that excludes use of realloc.
Instead of the presented code
int* data = new int[3];
//...
int* mydata = (int*)realloc(data,6*sizeof(int));
… do this:
vector<int> data( 3 );
//...
data.resize( 6 );
However, if you absolutely need the general efficiency of realloc, and if you have to accept new for the original allocation, then your only recourse for efficiency is to use compiler-specific means, knowledge that realloc is safe with this compiler.
Otherwise, if you absolutely need the general efficiency of realloc but is not forced to accept new, then you can use malloc and realloc. Using smart pointers then lets you get much of the same safety as with C++ containers.
The only possibly relevant restriction C++ adds to realloc is that C++'s malloc/calloc/realloc must not be implemented in terms of ::operator new, and its free must not be implemented in terms of ::operator delete (per C++14 [c.malloc]p3-4).
This means the guarantee you are looking for does not exist in C++. It also means, however, that you can implement ::operator new in terms of malloc. And if you do that, then in theory, ::operator new's result can be passed to realloc.
In practice, you should be concerned about the possibility that new's result does not match ::operator new's result. C++ compilers may e.g. combine multiple new expressions to use one single ::operator new call. This is something compilers already did when the standard didn't allow it, IIRC, and the standard now does allow it (per C++14 [expr.new]p10). That means that even if you go this route, you still don't have a guarantee that passing your new pointers to realloc does anything meaningful, even if it's no longer undefined behaviour.
In general, don't do that. If you are using user defined types with non-trivial initialization, in case of reallocation-copy-freeing, the destructor of your objects won't get called by realloc. The copy constructor won't be called too, when copying. This may lead to undefined behavior due to an incorrect use of object lifetime (see C++ Standard §3.8 Object lifetime, [basic.life]).
1 The lifetime of an object is a runtime property of the object. An object is said to have non-trivial initialization if it is of a class or aggregate type and it or one of its members is initialized by a constructor other than a trivial default constructor. [ Note: initialization by a trivial copy/move constructor is non-trivial initialization. —end note ]
The lifetime of an object of type T begins when:
— storage with the proper alignment and size for type T is obtained, and
— if the object has non-trivial initialization, its initialization is complete.
The lifetime of an object of type T ends when:
— if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
— the storage which the object occupies is reused or released.
And later (emphasis mine):
3 The properties ascribed to objects throughout this International Standard apply for a given object only during its lifetime.
So, you really don't want to use an object out of its lifetime.
It is not safe, and it's not elegant.
It might be possible to override new/delete to support the reallocation, but then you may as well consider to use the containers.
In general, no.
There are a slew of things which must hold to make it safe:
Bitwise copying the type and abandoning the source must be safe.
The destructor must be trivial, or you must in-place-destruct the elements you want to deallocate.
Either the constructor is trivial, or you must in-place-construct the new elements.
Trivial types satisfy the above requirements.
In addition:
The new[]-function must pass the request on to malloc without any change, nor do any bookkeeping on the side. You can force this by replacing global new[] and delete[], or the ones in the respective classes.
The compiler must not ask for more memory in order to save the number of elements allocated, or anything else.
There is no way to force that, though a compiler shouldn't save such information if the type has a trivial destructor as a matter of Quality of Implementation.
Yes - if new actually called malloc in the first place (for example, this is how VC++ new works).
No otherwise. do note that once you decide to reallocate the memory (because new called malloc), your code is compiler specific and not portable between compilers anymore.
(I know this answer may upset many developers, but I answer depends on real facts, not just idiomaticy).
That is not safe. Firstly the pointer you pass to realloc must have been obtained from malloc or realloc: http://en.cppreference.com/w/cpp/memory/c/realloc.
Secondly the result of new int [3] need not be the same as the result of the allocation function - extra space may be allocated to store the count of elements.
(And for more complex types than int, realloc wouldn't be safe since it doesn't call copy or move constructors.)
You may be able to (not in all cases), but you shouldn't. If you need to resize your data table, you should use std::vector instead.
Details on how to use it are listed in an other SO question.
These function is mostly used in C.
memset sets the bytes in a block of memory to a specific value.
malloc allocates a block of memory.
calloc, same as malloc. Only difference is that it initializes the bytes to zero.
In C++ the preferred method to allocate memory is to use new.
C: int intArray = (int*) malloc(10 *sizeof(int));
C++: int intArray = new int[10];
C: int intArray = (int*) calloc(10 *sizeof(int));
C++: int intArray = new int10;

How to properly access mapped memory without undefined behavior in C++

I've been trying to figure out how to access a mapped buffer from C++17 without invoking undefined behavior. For this example, I'll use a buffer returned by Vulkan's vkMapMemory.
So, according to N4659 (the final C++17 working draft), section [intro.object] (emphasis added):
The constructs in a C++
program create, destroy, refer to, access, and manipulate objects. An
object
is
created by a definition (6.1), by a
new-expression
(8.3.4), when implicitly changing the active member of a
union (12.3), or when a temporary object is created (7.4, 15.2).
These are, apparently, the only valid ways to create a C++ object. So let's say we get a void* pointer to a mapped region of host-visible (and coherent) device memory (assuming, of course, that all the required arguments have valid values and the call succeeds, and the returned block of memory is of sufficient size and properly aligned):
void* ptr{};
vkMapMemory(device, memory, offset, size, flags, &ptr);
assert(ptr != nullptr);
Now, I wish to access this memory as a float array. The obvious thing to do would be to static_cast the pointer and go on my merry way as follows:
volatile float* float_array = static_cast<volatile float*>(ptr);
(The volatile is included since this is mapped as coherent memory, and thus may be written by the GPU at any point). However, a float array doesn't technically exist in that memory location, at least not in the sense of the quoted excerpt, and thus accessing the memory through such a pointer would be undefined behavior. Therefore, according to my understanding, I'm left with two options:
1. memcpy the data
It should always be possible to use a local buffer, cast it to std::byte* and memcpy the representation over to the mapped region. The GPU will interpret it as instructed in the shaders (in this case, as an array of 32-bit float) and thus problem solved. However, this requires extra memory and extra copies, so I would prefer to avoid this.
2. placement-new the array
It appears that section [new.delete.placement] doesn't impose any restrictions on how the placement address is obtained (it need not be a safely-derived pointer regardless of the implementation's pointer safety). It should, therefore, be possible to create a valid float array via placement-new as follows:
volatile float* float_array = new (ptr) volatile float[sizeInFloats];
The pointer float_array should now be safe to access (within the bounds of the array, or one-past).
So, my questions are the following:
Is the simple static_cast indeed undefined behavior?
Is this placement-new usage well-defined?
Is this technique applicable to similar situations, such as accessing memory-mapped hardware?
As a side note, I've never had an issue by simply casting the returned pointer, I'm just trying to figure out what the proper way to do this would be, according to the letter of the standard.
Short answer
As per the Standard, everything involving hardware-mapped memory is undefined behavior since that concept does not exist for the abstract machine. You should refer to your implementation manual.
Long answer
Even though hardware-mapped memory is undefined behavior by the Standard, we can imagine any sane implementation providing some obeys common rules. Some constructs are then more undefined behavior than others (whatever that means).
Is the simple static_cast indeed undefined behavior?
volatile float* float_array = static_cast<volatile float*>(ptr);
Yes, this is undefined behavior and have been discussed many times on StackOverflow.
Is this placement-new usage well-defined?
volatile float* float_array = new (ptr) volatile float[N];
No, even though this looks well defined, this is implementation dependent. As it happens, operator ::new[] is allowed to reserve some overhead1, 2, and you cannot know how much unless you check your toolchain documentation. As a consequence, ::new (dst) T[N] requires an unknown amount of memory greater or equal to N*sizeof T and any dst you allocate might be too small, involving buffer overflow.
How to proceed, then?
A solution would be to manually build a sequence of floats:
auto p = static_cast<volatile float*>(ptr);
for (std::size_t n = 0 ; n < N; ++n) {
::new (p+n) volatile float;
}
Or equivalently, relying on the Standard Library:
#include <memory>
auto p = static_cast<volatile float*>(ptr);
std::uninitialized_default_construct(p, p+N);
This constructs contiguously N uninitialized volatile float objects at the memory pointed to by ptr. This means you must initialize those before reading them; reading an uninitialized object is undefined behavior.
Is this technique applicable to similar situations, such as accessing memory-mapped hardware?
No, again this is really implementation-defined. We can only assume your implementation took reasonable choices, but you should check what its documentation says.
The C++ spec has no concept of mapped memory, so everything to do with it is undefined behavior as far as the C++ spec is concerned. So you need to look to the particular implementation (compiler and operating system) that you are using to see what is defined and what you can do safely.
On most systems, mapping will return memory that came from somewhere else, and may (or may not) have been initialized in a way that is compatible with some specific type. In general, if the memory was originally written as float values of the correct, supported form, then you can safely cast the pointer to a float * and access it that way. But you do need to know how the memory being mapped was originally written.
C++ is compatible with C, and manipulating raw memory is kind-of-what C was perfect for. So don't worry, C++ is perfectly capable of doing what you want.
Edit:- follow this link for the simple answer for C/C++ compatibilty. -
In your example, you do not need to call new at all! To explain...
Not all objects in C++ require construction. These are known as PoD (plain-old-data) types. They are
1) Basic types (floats/ints/enums,etc).
2) All Pointers, but not smart pointers.
3) Arrays of PoD types.
4) Structures that contain only basic types, or other PoD types.
...
5) A class too can be a PoD-type, but the convention is that anything declared "class" should never be relied on to be PoD.
You can test whether a type is PoD using a standard function library object.
Now the only thing that is undefined about casting a pointer to a PoD-type, is that the contents of the structure are not set by anything, so you should treat them as "write-only" values. In your case you may have written to them from the "device" and so initializing them will destroy those values.
(The correct cast is a "reinterpret_cast" by the way)
You are right to worry about alignment issues, but you are wrong to think this is something the C++ code can fix. Alignment is a property of the memory, not a language feature. To align the memory, you must make sure the "offset" is always a multiple of the "alignas" of your structure. On x64/x86, getting this wrong will not create any problems, only slow down the access to your memory. On other systems it can cause a fatal exception.
On the other hand, your memory is not "volatile", it is accessed by another thread. This thread may be on another device, but it is another thread. You need to use thread-safe memory. In C++, this is provided by atomic variables. However, an "atomic" is not a PoD object! You should use a memory fence instead. These primitives force the memory to be read from and to memory. The volatile keyword does this too, but the compiler is allowed to reorder volatile writes and this may cause unexpected results.
Finally, if you want your code to be "modern C++" style, you should do the following.
1) Declare your custom PoD structure to represent your data layout. You can use static_assert(std::is_pod<MyType>::value). This will warn you if the structure is not compatible.
2) Declare a pointer to your type. (In this case only, do not use a smart pointer, unless there is a way to "free" the memory which makes sense)
3) Only allocate memory via a call which returns this pointer type. This function needs to
a) Initialize your pointer type with the result of your call to the Vulkan API.
b) Use an in-place new on the pointer - this is not required if you only write to the data - but is good practice. If you want to default values, initialize them in your structure declaration. If you want to keep the values, simply don't give them default values, and the in-place new will do nothing.
Use an "acquire" fence before reading the memory, a "release" fence after writing. Vulcan may provide a specific mechanism for this, I don't know. It is normal though for all synchronization primitives (such as mutex lock/unlock) to imply a memory fence, so you may get away without this step.

Difference between Object object = new Object() and Object object [duplicate]

I'm coming from a Java background and have started working with objects in C++. But one thing that occurred to me is that people often use pointers to objects rather than the objects themselves, for example this declaration:
Object *myObject = new Object;
rather than:
Object myObject;
Or instead of using a function, let's say testFunc(), like this:
myObject.testFunc();
we have to write:
myObject->testFunc();
But I can't figure out why should we do it this way. I would assume it has to do with efficiency and speed since we get direct access to the memory address. Am I right?
It's very unfortunate that you see dynamic allocation so often. That just shows how many bad C++ programmers there are.
In a sense, you have two questions bundled up into one. The first is when should we use dynamic allocation (using new)? The second is when should we use pointers?
The important take-home message is that you should always use the appropriate tool for the job. In almost all situations, there is something more appropriate and safer than performing manual dynamic allocation and/or using raw pointers.
Dynamic allocation
In your question, you've demonstrated two ways of creating an object. The main difference is the storage duration of the object. When doing Object myObject; within a block, the object is created with automatic storage duration, which means it will be destroyed automatically when it goes out of scope. When you do new Object(), the object has dynamic storage duration, which means it stays alive until you explicitly delete it. You should only use dynamic storage duration when you need it.
That is, you should always prefer creating objects with automatic storage duration when you can.
The main two situations in which you might require dynamic allocation:
You need the object to outlive the current scope - that specific object at that specific memory location, not a copy of it. If you're okay with copying/moving the object (most of the time you should be), you should prefer an automatic object.
You need to allocate a lot of memory, which may easily fill up the stack. It would be nice if we didn't have to concern ourselves with this (most of the time you shouldn't have to), as it's really outside the purview of C++, but unfortunately, we have to deal with the reality of the systems we're developing for.
When you do absolutely require dynamic allocation, you should encapsulate it in a smart pointer or some other type that performs RAII (like the standard containers). Smart pointers provide ownership semantics of dynamically allocated objects. Take a look at std::unique_ptr and std::shared_ptr, for example. If you use them appropriately, you can almost entirely avoid performing your own memory management (see the Rule of Zero).
Pointers
However, there are other more general uses for raw pointers beyond dynamic allocation, but most have alternatives that you should prefer. As before, always prefer the alternatives unless you really need pointers.
You need reference semantics. Sometimes you want to pass an object using a pointer (regardless of how it was allocated) because you want the function to which you're passing it to have access that that specific object (not a copy of it). However, in most situations, you should prefer reference types to pointers, because this is specifically what they're designed for. Note this is not necessarily about extending the lifetime of the object beyond the current scope, as in situation 1 above. As before, if you're okay with passing a copy of the object, you don't need reference semantics.
You need polymorphism. You can only call functions polymorphically (that is, according to the dynamic type of an object) through a pointer or reference to the object. If that's the behavior you need, then you need to use pointers or references. Again, references should be preferred.
You want to represent that an object is optional by allowing a nullptr to be passed when the object is being omitted. If it's an argument, you should prefer to use default arguments or function overloads. Otherwise, you should preferably use a type that encapsulates this behavior, such as std::optional (introduced in C++17 - with earlier C++ standards, use boost::optional).
You want to decouple compilation units to improve compilation time. The useful property of a pointer is that you only require a forward declaration of the pointed-to type (to actually use the object, you'll need a definition). This allows you to decouple parts of your compilation process, which may significantly improve compilation time. See the Pimpl idiom.
You need to interface with a C library or a C-style library. At this point, you're forced to use raw pointers. The best thing you can do is make sure you only let your raw pointers loose at the last possible moment. You can get a raw pointer from a smart pointer, for example, by using its get member function. If a library performs some allocation for you which it expects you to deallocate via a handle, you can often wrap the handle up in a smart pointer with a custom deleter that will deallocate the object appropriately.
There are many use cases for pointers.
Polymorphic behavior. For polymorphic types, pointers (or references) are used to avoid slicing:
class Base { ... };
class Derived : public Base { ... };
void fun(Base b) { ... }
void gun(Base* b) { ... }
void hun(Base& b) { ... }
Derived d;
fun(d); // oops, all Derived parts silently "sliced" off
gun(&d); // OK, a Derived object IS-A Base object
hun(d); // also OK, reference also doesn't slice
Reference semantics and avoiding copying. For non-polymorphic types, a pointer (or a reference) will avoid copying a potentially expensive object
Base b;
fun(b); // copies b, potentially expensive
gun(&b); // takes a pointer to b, no copying
hun(b); // regular syntax, behaves as a pointer
Note that C++11 has move semantics that can avoid many copies of expensive objects into function argument and as return values. But using a pointer will definitely avoid those and will allow multiple pointers on the same object (whereas an object can only be moved from once).
Resource acquisition. Creating a pointer to a resource using the new operator is an anti-pattern in modern C++. Use a special resource class (one of the Standard containers) or a smart pointer (std::unique_ptr<> or std::shared_ptr<>). Consider:
{
auto b = new Base;
... // oops, if an exception is thrown, destructor not called!
delete b;
}
vs.
{
auto b = std::make_unique<Base>();
... // OK, now exception safe
}
A raw pointer should only be used as a "view" and not in any way involved in ownership, be it through direct creation or implicitly through return values. See also this Q&A from the C++ FAQ.
More fine-grained life-time control Every time a shared pointer is being copied (e.g. as a function argument) the resource it points to is being kept alive. Regular objects (not created by new, either directly by you or inside a resource class) are destroyed when going out of scope.
There are many excellent answers to this question, including the important use cases of forward declarations, polymorphism etc. but I feel a part of the "soul" of your question is not answered - namely what the different syntaxes mean across Java and C++.
Let's examine the situation comparing the two languages:
Java:
Object object1 = new Object(); //A new object is allocated by Java
Object object2 = new Object(); //Another new object is allocated by Java
object1 = object2;
//object1 now points to the object originally allocated for object2
//The object originally allocated for object1 is now "dead" - nothing points to it, so it
//will be reclaimed by the Garbage Collector.
//If either object1 or object2 is changed, the change will be reflected to the other
The closest equivalent to this, is:
C++:
Object * object1 = new Object(); //A new object is allocated on the heap
Object * object2 = new Object(); //Another new object is allocated on the heap
delete object1;
//Since C++ does not have a garbage collector, if we don't do that, the next line would
//cause a "memory leak", i.e. a piece of claimed memory that the app cannot use
//and that we have no way to reclaim...
object1 = object2; //Same as Java, object1 points to object2.
Let's see the alternative C++ way:
Object object1; //A new object is allocated on the STACK
Object object2; //Another new object is allocated on the STACK
object1 = object2;//!!!! This is different! The CONTENTS of object2 are COPIED onto object1,
//using the "copy assignment operator", the definition of operator =.
//But, the two objects are still different. Change one, the other remains unchanged.
//Also, the objects get automatically destroyed once the function returns...
The best way to think of it is that -- more or less -- Java (implicitly) handles pointers to objects, while C++ may handle either pointers to objects, or the objects themselves.
There are exceptions to this -- for example, if you declare Java "primitive" types, they are actual values that are copied, and not pointers.
So,
Java:
int object1; //An integer is allocated on the stack.
int object2; //Another integer is allocated on the stack.
object1 = object2; //The value of object2 is copied to object1.
That said, using pointers is NOT necessarily either the correct or the wrong way to handle things; however other answers have covered that satisfactorily. The general idea though is that in C++ you have much more control on the lifetime of the objects, and on where they will live.
Take home point -- the Object * object = new Object() construct is actually what is closest to typical Java (or C# for that matter) semantics.
Preface
Java is nothing like C++, contrary to hype. The Java hype machine would like you to believe that because Java has C++ like syntax, that the languages are similar. Nothing can be further from the truth. This misinformation is part of the reason why Java programmers go to C++ and use Java-like syntax without understanding the implications of their code.
Onwards we go
But I can't figure out why should we do it this way. I would assume it
has to do with efficiency and speed since we get direct access to the
memory address. Am I right?
To the contrary, actually. The heap is much slower than the stack, because the stack is very simple compared to the heap. Automatic storage variables (aka stack variables) have their destructors called once they go out of scope. For example:
{
std::string s;
}
// s is destroyed here
On the other hand, if you use a pointer dynamically allocated, its destructor must be called manually. delete calls this destructor for you.
{
std::string* s = new std::string;
delete s; // destructor called
}
This has nothing to do with the new syntax prevalent in C# and Java. They are used for completely different purposes.
Benefits of dynamic allocation
1. You don't have to know the size of the array in advance
One of the first problems many C++ programmers run into is that when they are accepting arbitrary input from users, you can only allocate a fixed size for a stack variable. You cannot change the size of arrays either. For example:
char buffer[100];
std::cin >> buffer;
// bad input = buffer overflow
Of course, if you used an std::string instead, std::string internally resizes itself so that shouldn't be a problem. But essentially the solution to this problem is dynamic allocation. You can allocate dynamic memory based on the input of the user, for example:
int * pointer;
std::cout << "How many items do you need?";
std::cin >> n;
pointer = new int[n];
Side note: One mistake many beginners make is the usage of
variable length arrays. This is a GNU extension and also one in Clang
because they mirror many of GCC's extensions. So the following
int arr[n] should not be relied on.
Because the heap is much bigger than the stack, one can arbitrarily allocate/reallocate as much memory as he/she needs, whereas the stack has a limitation.
2. Arrays are not pointers
How is this a benefit you ask? The answer will become clear once you understand the confusion/myth behind arrays and pointers. It is commonly assumed that they are the same, but they are not. This myth comes from the fact that pointers can be subscripted just like arrays and because of arrays decay to pointers at the top level in a function declaration. However, once an array decays to a pointer, the pointer loses its sizeof information. So sizeof(pointer) will give the size of the pointer in bytes, which is usually 8 bytes on a 64-bit system.
You cannot assign to arrays, only initialize them. For example:
int arr[5] = {1, 2, 3, 4, 5}; // initialization
int arr[] = {1, 2, 3, 4, 5}; // The standard dictates that the size of the array
// be given by the amount of members in the initializer
arr = { 1, 2, 3, 4, 5 }; // ERROR
On the other hand, you can do whatever you want with pointers. Unfortunately, because the distinction between pointers and arrays are hand-waved in Java and C#, beginners don't understand the difference.
3. Polymorphism
Java and C# have facilities that allow you to treat objects as another, for example using the as keyword. So if somebody wanted to treat an Entity object as a Player object, one could do Player player = Entity as Player; This is very useful if you intend to call functions on a homogeneous container that should only apply to a specific type. The functionality can be achieved in a similar fashion below:
std::vector<Base*> vector;
vector.push_back(&square);
vector.push_back(&triangle);
for (auto& e : vector)
{
auto test = dynamic_cast<Triangle*>(e); // I only care about triangles
if (!test) // not a triangle
e.GenericFunction();
else
e.TriangleOnlyMagic();
}
So say if only Triangles had a Rotate function, it would be a compiler error if you tried to call it on all objects of the class. Using dynamic_cast, you can simulate the as keyword. To be clear, if a cast fails, it returns an invalid pointer. So !test is essentially a shorthand for checking if test is NULL or an invalid pointer, which means the cast failed.
Benefits of automatic variables
After seeing all the great things dynamic allocation can do, you're probably wondering why wouldn't anyone NOT use dynamic allocation all the time? I already told you one reason, the heap is slow. And if you don't need all that memory, you shouldn't abuse it. So here are some disadvantages in no particular order:
It is error-prone. Manual memory allocation is dangerous and you are prone to leaks. If you are not proficient at using the debugger or valgrind (a memory leak tool), you may pull your hair out of your head. Luckily RAII idioms and smart pointers alleviate this a bit, but you must be familiar with practices such as The Rule Of Three and The Rule Of Five. It is a lot of information to take in, and beginners who either don't know or don't care will fall into this trap.
It is not necessary. Unlike Java and C# where it is idiomatic to use the new keyword everywhere, in C++, you should only use it if you need to. The common phrase goes, everything looks like a nail if you have a hammer. Whereas beginners who start with C++ are scared of pointers and learn to use stack variables by habit, Java and C# programmers start by using pointers without understanding it! That is literally stepping off on the wrong foot. You must abandon everything you know because the syntax is one thing, learning the language is another.
1. (N)RVO - Aka, (Named) Return Value Optimization
One optimization many compilers make are things called elision and return value optimization. These things can obviate unnecessary copys which is useful for objects that are very large, such as a vector containing many elements. Normally the common practice is to use pointers to transfer ownership rather than copying the large objects to move them around. This has lead to the inception of move semantics and smart pointers.
If you are using pointers, (N)RVO does NOT occur. It is more beneficial and less error-prone to take advantage of (N)RVO rather than returning or passing pointers if you are worried about optimization. Error leaks can happen if the caller of a function is responsible for deleteing a dynamically allocated object and such. It can be difficult to track the ownership of an object if pointers are being passed around like a hot potato. Just use stack variables because it is simpler and better.
Another good reason to use pointers would be for forward declarations. In a large enough project they can really speed up compile time.
In C++, objects allocated on the stack (using Object object; statement within a block) will only live within the scope they are declared in. When the block of code finishes execution, the object declared are destroyed.
Whereas if you allocate memory on heap, using Object* obj = new Object(), they continue to live in heap until you call delete obj.
I would create an object on heap when I like to use the object not only in the block of code which declared/allocated it.
C++ gives you three ways to pass an object: by pointer, by reference, and by value. Java limits you with the latter one (the only exception is primitive types like int, boolean etc). If you want to use C++ not just like a weird toy, then you'd better get to know the difference between these three ways.
Java pretends that there is no such problem as 'who and when should destroy this?'. The answer is: The Garbage Collector, Great and Awful. Nevertheless, it can't provide 100% protection against memory leaks (yes, java can leak memory). Actually, GC gives you a false sense of safety. The bigger your SUV, the longer your way to the evacuator.
C++ leaves you face-to-face with object's lifecycle management. Well, there are means to deal with that (smart pointers family, QObject in Qt and so on), but none of them can be used in 'fire and forget' manner like GC: you should always keep in mind memory handling. Not only should you care about destroying an object, you also have to avoid destroying the same object more than once.
Not scared yet? Ok: cyclic references - handle them yourself, human. And remember: kill each object precisely once, we C++ runtimes don't like those who mess with corpses, leave dead ones alone.
So, back to your question.
When you pass your object around by value, not by pointer or by reference, you copy the object (the whole object, whether it's a couple of bytes or a huge database dump - you're smart enough to care to avoid latter, aren't you?) every time you do '='. And to access the object's members, you use '.' (dot).
When you pass your object by pointer, you copy just a few bytes (4 on 32-bit systems, 8 on 64-bit ones), namely - the address of this object. And to show this to everyone, you use this fancy '->' operator when you access the members. Or you can use the combination of '*' and '.'.
When you use references, then you get the pointer that pretends to be a value. It's a pointer, but you access the members through '.'.
And, to blow your mind one more time: when you declare several variables separated by commas, then (watch the hands):
Type is given to everyone
Value/pointer/reference modifier is individual
Example:
struct MyStruct
{
int* someIntPointer, someInt; //here comes the surprise
MyStruct *somePointer;
MyStruct &someReference;
};
MyStruct s1; //we allocated an object on stack, not in heap
s1.someInt = 1; //someInt is of type 'int', not 'int*' - value/pointer modifier is individual
s1.someIntPointer = &s1.someInt;
*s1.someIntPointer = 2; //now s1.someInt has value '2'
s1.somePointer = &s1;
s1.someReference = s1; //note there is no '&' operator: reference tries to look like value
s1.somePointer->someInt = 3; //now s1.someInt has value '3'
*(s1.somePointer).someInt = 3; //same as above line
*s1.somePointer->someIntPointer = 4; //now s1.someInt has value '4'
s1.someReference.someInt = 5; //now s1.someInt has value '5'
//although someReference is not value, it's members are accessed through '.'
MyStruct s2 = s1; //'NO WAY' the compiler will say. Go define your '=' operator and come back.
//OK, assume we have '=' defined in MyStruct
s2.someInt = 0; //s2.someInt == 0, but s1.someInt is still 5 - it's two completely different objects, not the references to the same one
But I can't figure out why should we use it like this?
I will compare how it works inside the function body if you use:
Object myObject;
Inside the function, your myObject will get destroyed once this function returns. So this is useful if you don't need your object outside your function. This object will be put on current thread stack.
If you write inside function body:
Object *myObject = new Object;
then Object class instance pointed by myObject will not get destroyed once the function ends, and allocation is on the heap.
Now if you are Java programmer, then the second example is closer to how object allocation works under java. This line: Object *myObject = new Object; is equivalent to java: Object myObject = new Object();. The difference is that under java myObject will get garbage collected, while under c++ it will not get freed, you must somewhere explicitly call `delete myObject;' otherwise you will introduce memory leaks.
Since c++11 you can use safe ways of dynamic allocations: new Object, by storing values in shared_ptr/unique_ptr.
std::shared_ptr<std::string> safe_str = make_shared<std::string>("make_shared");
// since c++14
std::unique_ptr<std::string> safe_str = make_unique<std::string>("make_shared");
also, objects are very often stored in containers, like map-s or vector-s, they will automatically manage a lifetime of your objects.
Technically it is a memory allocation issue, however here are two more practical aspects of this.
It has to do with two things:
1) Scope, when you define an object without a pointer you will no longer be able to access it after the code block it is defined in, whereas if you define a pointer with "new" then you can access it from anywhere you have a pointer to this memory until you call "delete" on the same pointer.
2) If you want to pass arguments to a function you want to pass a pointer or a reference in order to be more efficient. When you pass an Object then the object is copied, if this is an object that uses a lot of memory this might be CPU consuming (e.g. you copy a vector full of data). When you pass a pointer all you pass is one int (depending of implementation but most of them are one int).
Other than that you need to understand that "new" allocates memory on the heap that needs to be freed at some point. When you don't have to use "new" I suggest you use a regular object definition "on the stack".
Well the main question is Why should I use a pointer rather than the object itself? And my answer, you should (almost) never use pointer instead of object, because C++ has references, it is safer then pointers and guarantees the same performance as pointers.
Another thing you mentioned in your question:
Object *myObject = new Object;
How does it work? It creates pointer of Object type, allocates memory to fit one object and calls default constructor, sounds good, right? But actually it isn't so good, if you dynamically allocated memory (used keyword new), you also have to free memory manually, that means in code you should have:
delete myObject;
This calls destructor and frees memory, looks easy, however in big projects may be difficult to detect if one thread freed memory or not, but for that purpose you can try shared pointers, these slightly decreases performance, but it is much easier to work with them.
And now some introduction is over and go back to question.
You can use pointers instead of objects to get better performance while transferring data between function.
Take a look, you have std::string (it is also object) and it contains really much data, for example big XML, now you need to parse it, but for that you have function void foo(...) which can be declarated in different ways:
void foo(std::string xml);
In this case you will copy all data from your variable to function stack, it takes some time, so your performance will be low.
void foo(std::string* xml);
In this case you will pass pointer to object, same speed as passing size_t variable, however this declaration has error prone, because you can pass NULL pointer or invalid pointer. Pointers usually used in C because it doesn't have references.
void foo(std::string& xml);
Here you pass reference, basically it is the same as passing pointer, but compiler does some stuff and you cannot pass invalid reference (actually it is possible to create situation with invalid reference, but it is tricking compiler).
void foo(const std::string* xml);
Here is the same as second, just pointer value cannot be changed.
void foo(const std::string& xml);
Here is the same as third, but object value cannot be changed.
What more I want to mention, you can use these 5 ways to pass data no matter which allocation way you have chosen (with new or regular).
Another thing to mention, when you create object in regular way, you allocate memory in stack, but while you create it with new you allocate heap. It is much faster to allocate stack, but it is kind a small for really big arrays of data, so if you need big object you should use heap, because you may get stack overflow, but usually this issue is solved using STL containers and remember std::string is also container, some guys forgot it :)
Let's say that you have class A that contain class B When you want to call some function of class B outside class A you will simply obtain a pointer to this class and you can do whatever you want and it will also change context of class B in your class A
But be careful with dynamic object
There are many benefits of using pointers to object -
Efficiency (as you already pointed out). Passing objects to
functions mean creating new copies of object.
Working with objects from third party libraries. If your object
belongs to a third party code and the authors intend the usage of their objects through pointers only (no copy constructors etc) the only way you can pass around this
object is using pointers. Passing by value may cause issues. (Deep
copy / shallow copy issues).
if the object owns a resource and you want that the ownership should not be sahred with other objects.
This is has been discussed at length, but in Java everything is a pointer. It makes no distinction between stack and heap allocations (all objects are allocated on the heap), so you don't realize you're using pointers. In C++, you can mix the two, depending on your memory requirements. Performance and memory usage is more deterministic in C++ (duh).
Object *myObject = new Object;
Doing this will create a reference to an Object (on the heap) which has to be deleted explicitly to avoid memory leak.
Object myObject;
Doing this will create an object(myObject) of the automatic type (on the stack) that will be automatically deleted when the object(myObject) goes out of scope.
A pointer directly references the memory location of an object. Java has nothing like this. Java has references that reference the location of object through hash tables. You cannot do anything like pointer arithmetic in Java with these references.
To answer your question, it's just your preference. I prefer using the Java-like syntax.
The key strength of object pointers in C++ is allowing for polymorphic arrays and maps of pointers of the same superclass. It allows, for example, to put parakeets, chickens, robins, ostriches, etc. in an array of Bird.
Additionally, dynamically allocated objects are more flexible, and can use HEAP memory whereas a locally allocated object will use the STACK memory unless it is static. Having large objects on the stack, especially when using recursion, will undoubtedly lead to stack overflow.
One reason for using pointers is to interface with C functions. Another reason is to save memory; for example: instead of passing an object which contains a lot of data and has a processor-intensive copy-constructor to a function, just pass a pointer to the object, saving memory and speed especially if you're in a loop, however a reference would be better in that case, unless you're using an C-style array.
In areas where memory utilization is at its premium , pointers comes handy. For example consider a minimax algorithm, where thousands of nodes will be generated using recursive routine, and later use them to evaluate the next best move in game, ability to deallocate or reset (as in smart pointers) significantly reduces memory consumption. Whereas the non-pointer variable continues to occupy space till it's recursive call returns a value.
I will include one important use case of pointer. When you are storing some object in the base class, but it could be polymorphic.
Class Base1 {
};
Class Derived1 : public Base1 {
};
Class Base2 {
Base *bObj;
virtual void createMemerObects() = 0;
};
Class Derived2 {
virtual void createMemerObects() {
bObj = new Derived1();
}
};
So in this case you can't declare bObj as an direct object, you have to have pointer.
tl;dr: Don't "use a pointer rather than the object itself" (usually)
You asked why you should prefer a pointer rather than the object itself. Well, you shouldn't, as a general rule.
Now, there are indeed multiple exceptions to this rule, and other answers have spelled them out. The thing is, these days, many of these exceptions are no longer valid! Let us consider the exceptions listed in the accepted answer:
You need reference semantics.
If you need reference semantics, use references, not pointers; see #ST3's answer's answer. In fact, one could argue that, in Java, what you pass around are usually references.
You need polymorphism.
If you know the set of classes you'll be working with, very often you can just use an std::variant<ClassA, ClassB, ClassC> (see description here) and operate on them using a visitor pattern. Now, granted, C++'s variant implementation is not the prettiest sight; but I'd usually prefer it over getting down-and-dirty with pointers.
You want to represent that an object is optional
Absolutely don't use pointers for that. You have std::optional, and unlike std::variant, it's quite convenient. Use that instead. nullopt is an empty (or "null") optional. And - it's not a pointer.
You want to decouple compilation units to improve compilation time.
You can use references rather than pointers to achieve this as well. To use Object& in a piece of code, it's sufficient to say class Object;, i.e. to use a forward-declaration.
You need to interface with a C library or a C-style library.
Yeah, well, if you work with code that already uses pointers, then - you have to use pointers yourself, can't get around that :-( and C doesn't have references.
Also, some people may tell you to use pointers to avoid making copies of objects. Well this is not really a problem for return values, due to the return-value and named-return-value optimizations (RVO and NRVO). And in other cases - references avoid copying just fine.
The bottom-line rule is still the same as the accepted answer, though: Only use a pointer when you have a good reason to need one.
PS - If you do need a pointer, you should still avoid using new and delete directly. You will probably be better served by a smart pointer - which is automagically freed (not like in Java, but still).
With pointers ,
can directly talk to the memory.
can prevent lot of memory leaks of a program by manipulating pointers.
"Necessity is the mother of invention."
The most of important difference that I would like to point out is the outcome of my own experience of coding.
Sometimes you need to pass objects to functions. In that case, if your object is of a very big class then passing it as an object will copy its state (which you might not want ..AND CAN BE BIG OVERHEAD) thus resulting in an overhead of copying object .while pointer is fixed 4-byte size (assuming 32 bit). Other reasons are already mentioned above...
There are many excellent answers already, but let me give you one example:
I have an simple Item class:
class Item
{
public:
std::string name;
int weight;
int price;
};
I make a vector to hold a bunch of them.
std::vector<Item> inventory;
I create one million Item objects, and push them back onto the vector. I sort the vector by name, and then do a simple iterative binary search for a particular item name. I test the program, and it takes over 8 minutes to finish executing. Then I change my inventory vector like so:
std::vector<Item *> inventory;
...and create my million Item objects via new. The ONLY changes I make to my code are to use the pointers to Items, excepting a loop I add for memory cleanup at the end. That program runs in under 40 seconds, or better than a 10x speed increase.
EDIT: The code is at http://pastebin.com/DK24SPeW
With compiler optimizations it shows only a 3.4x increase on the machine I just tested it on, which is still considerable.

I don't understand the point of pointers to classes [duplicate]

I'm coming from a Java background and have started working with objects in C++. But one thing that occurred to me is that people often use pointers to objects rather than the objects themselves, for example this declaration:
Object *myObject = new Object;
rather than:
Object myObject;
Or instead of using a function, let's say testFunc(), like this:
myObject.testFunc();
we have to write:
myObject->testFunc();
But I can't figure out why should we do it this way. I would assume it has to do with efficiency and speed since we get direct access to the memory address. Am I right?
It's very unfortunate that you see dynamic allocation so often. That just shows how many bad C++ programmers there are.
In a sense, you have two questions bundled up into one. The first is when should we use dynamic allocation (using new)? The second is when should we use pointers?
The important take-home message is that you should always use the appropriate tool for the job. In almost all situations, there is something more appropriate and safer than performing manual dynamic allocation and/or using raw pointers.
Dynamic allocation
In your question, you've demonstrated two ways of creating an object. The main difference is the storage duration of the object. When doing Object myObject; within a block, the object is created with automatic storage duration, which means it will be destroyed automatically when it goes out of scope. When you do new Object(), the object has dynamic storage duration, which means it stays alive until you explicitly delete it. You should only use dynamic storage duration when you need it.
That is, you should always prefer creating objects with automatic storage duration when you can.
The main two situations in which you might require dynamic allocation:
You need the object to outlive the current scope - that specific object at that specific memory location, not a copy of it. If you're okay with copying/moving the object (most of the time you should be), you should prefer an automatic object.
You need to allocate a lot of memory, which may easily fill up the stack. It would be nice if we didn't have to concern ourselves with this (most of the time you shouldn't have to), as it's really outside the purview of C++, but unfortunately, we have to deal with the reality of the systems we're developing for.
When you do absolutely require dynamic allocation, you should encapsulate it in a smart pointer or some other type that performs RAII (like the standard containers). Smart pointers provide ownership semantics of dynamically allocated objects. Take a look at std::unique_ptr and std::shared_ptr, for example. If you use them appropriately, you can almost entirely avoid performing your own memory management (see the Rule of Zero).
Pointers
However, there are other more general uses for raw pointers beyond dynamic allocation, but most have alternatives that you should prefer. As before, always prefer the alternatives unless you really need pointers.
You need reference semantics. Sometimes you want to pass an object using a pointer (regardless of how it was allocated) because you want the function to which you're passing it to have access that that specific object (not a copy of it). However, in most situations, you should prefer reference types to pointers, because this is specifically what they're designed for. Note this is not necessarily about extending the lifetime of the object beyond the current scope, as in situation 1 above. As before, if you're okay with passing a copy of the object, you don't need reference semantics.
You need polymorphism. You can only call functions polymorphically (that is, according to the dynamic type of an object) through a pointer or reference to the object. If that's the behavior you need, then you need to use pointers or references. Again, references should be preferred.
You want to represent that an object is optional by allowing a nullptr to be passed when the object is being omitted. If it's an argument, you should prefer to use default arguments or function overloads. Otherwise, you should preferably use a type that encapsulates this behavior, such as std::optional (introduced in C++17 - with earlier C++ standards, use boost::optional).
You want to decouple compilation units to improve compilation time. The useful property of a pointer is that you only require a forward declaration of the pointed-to type (to actually use the object, you'll need a definition). This allows you to decouple parts of your compilation process, which may significantly improve compilation time. See the Pimpl idiom.
You need to interface with a C library or a C-style library. At this point, you're forced to use raw pointers. The best thing you can do is make sure you only let your raw pointers loose at the last possible moment. You can get a raw pointer from a smart pointer, for example, by using its get member function. If a library performs some allocation for you which it expects you to deallocate via a handle, you can often wrap the handle up in a smart pointer with a custom deleter that will deallocate the object appropriately.
There are many use cases for pointers.
Polymorphic behavior. For polymorphic types, pointers (or references) are used to avoid slicing:
class Base { ... };
class Derived : public Base { ... };
void fun(Base b) { ... }
void gun(Base* b) { ... }
void hun(Base& b) { ... }
Derived d;
fun(d); // oops, all Derived parts silently "sliced" off
gun(&d); // OK, a Derived object IS-A Base object
hun(d); // also OK, reference also doesn't slice
Reference semantics and avoiding copying. For non-polymorphic types, a pointer (or a reference) will avoid copying a potentially expensive object
Base b;
fun(b); // copies b, potentially expensive
gun(&b); // takes a pointer to b, no copying
hun(b); // regular syntax, behaves as a pointer
Note that C++11 has move semantics that can avoid many copies of expensive objects into function argument and as return values. But using a pointer will definitely avoid those and will allow multiple pointers on the same object (whereas an object can only be moved from once).
Resource acquisition. Creating a pointer to a resource using the new operator is an anti-pattern in modern C++. Use a special resource class (one of the Standard containers) or a smart pointer (std::unique_ptr<> or std::shared_ptr<>). Consider:
{
auto b = new Base;
... // oops, if an exception is thrown, destructor not called!
delete b;
}
vs.
{
auto b = std::make_unique<Base>();
... // OK, now exception safe
}
A raw pointer should only be used as a "view" and not in any way involved in ownership, be it through direct creation or implicitly through return values. See also this Q&A from the C++ FAQ.
More fine-grained life-time control Every time a shared pointer is being copied (e.g. as a function argument) the resource it points to is being kept alive. Regular objects (not created by new, either directly by you or inside a resource class) are destroyed when going out of scope.
There are many excellent answers to this question, including the important use cases of forward declarations, polymorphism etc. but I feel a part of the "soul" of your question is not answered - namely what the different syntaxes mean across Java and C++.
Let's examine the situation comparing the two languages:
Java:
Object object1 = new Object(); //A new object is allocated by Java
Object object2 = new Object(); //Another new object is allocated by Java
object1 = object2;
//object1 now points to the object originally allocated for object2
//The object originally allocated for object1 is now "dead" - nothing points to it, so it
//will be reclaimed by the Garbage Collector.
//If either object1 or object2 is changed, the change will be reflected to the other
The closest equivalent to this, is:
C++:
Object * object1 = new Object(); //A new object is allocated on the heap
Object * object2 = new Object(); //Another new object is allocated on the heap
delete object1;
//Since C++ does not have a garbage collector, if we don't do that, the next line would
//cause a "memory leak", i.e. a piece of claimed memory that the app cannot use
//and that we have no way to reclaim...
object1 = object2; //Same as Java, object1 points to object2.
Let's see the alternative C++ way:
Object object1; //A new object is allocated on the STACK
Object object2; //Another new object is allocated on the STACK
object1 = object2;//!!!! This is different! The CONTENTS of object2 are COPIED onto object1,
//using the "copy assignment operator", the definition of operator =.
//But, the two objects are still different. Change one, the other remains unchanged.
//Also, the objects get automatically destroyed once the function returns...
The best way to think of it is that -- more or less -- Java (implicitly) handles pointers to objects, while C++ may handle either pointers to objects, or the objects themselves.
There are exceptions to this -- for example, if you declare Java "primitive" types, they are actual values that are copied, and not pointers.
So,
Java:
int object1; //An integer is allocated on the stack.
int object2; //Another integer is allocated on the stack.
object1 = object2; //The value of object2 is copied to object1.
That said, using pointers is NOT necessarily either the correct or the wrong way to handle things; however other answers have covered that satisfactorily. The general idea though is that in C++ you have much more control on the lifetime of the objects, and on where they will live.
Take home point -- the Object * object = new Object() construct is actually what is closest to typical Java (or C# for that matter) semantics.
Preface
Java is nothing like C++, contrary to hype. The Java hype machine would like you to believe that because Java has C++ like syntax, that the languages are similar. Nothing can be further from the truth. This misinformation is part of the reason why Java programmers go to C++ and use Java-like syntax without understanding the implications of their code.
Onwards we go
But I can't figure out why should we do it this way. I would assume it
has to do with efficiency and speed since we get direct access to the
memory address. Am I right?
To the contrary, actually. The heap is much slower than the stack, because the stack is very simple compared to the heap. Automatic storage variables (aka stack variables) have their destructors called once they go out of scope. For example:
{
std::string s;
}
// s is destroyed here
On the other hand, if you use a pointer dynamically allocated, its destructor must be called manually. delete calls this destructor for you.
{
std::string* s = new std::string;
delete s; // destructor called
}
This has nothing to do with the new syntax prevalent in C# and Java. They are used for completely different purposes.
Benefits of dynamic allocation
1. You don't have to know the size of the array in advance
One of the first problems many C++ programmers run into is that when they are accepting arbitrary input from users, you can only allocate a fixed size for a stack variable. You cannot change the size of arrays either. For example:
char buffer[100];
std::cin >> buffer;
// bad input = buffer overflow
Of course, if you used an std::string instead, std::string internally resizes itself so that shouldn't be a problem. But essentially the solution to this problem is dynamic allocation. You can allocate dynamic memory based on the input of the user, for example:
int * pointer;
std::cout << "How many items do you need?";
std::cin >> n;
pointer = new int[n];
Side note: One mistake many beginners make is the usage of
variable length arrays. This is a GNU extension and also one in Clang
because they mirror many of GCC's extensions. So the following
int arr[n] should not be relied on.
Because the heap is much bigger than the stack, one can arbitrarily allocate/reallocate as much memory as he/she needs, whereas the stack has a limitation.
2. Arrays are not pointers
How is this a benefit you ask? The answer will become clear once you understand the confusion/myth behind arrays and pointers. It is commonly assumed that they are the same, but they are not. This myth comes from the fact that pointers can be subscripted just like arrays and because of arrays decay to pointers at the top level in a function declaration. However, once an array decays to a pointer, the pointer loses its sizeof information. So sizeof(pointer) will give the size of the pointer in bytes, which is usually 8 bytes on a 64-bit system.
You cannot assign to arrays, only initialize them. For example:
int arr[5] = {1, 2, 3, 4, 5}; // initialization
int arr[] = {1, 2, 3, 4, 5}; // The standard dictates that the size of the array
// be given by the amount of members in the initializer
arr = { 1, 2, 3, 4, 5 }; // ERROR
On the other hand, you can do whatever you want with pointers. Unfortunately, because the distinction between pointers and arrays are hand-waved in Java and C#, beginners don't understand the difference.
3. Polymorphism
Java and C# have facilities that allow you to treat objects as another, for example using the as keyword. So if somebody wanted to treat an Entity object as a Player object, one could do Player player = Entity as Player; This is very useful if you intend to call functions on a homogeneous container that should only apply to a specific type. The functionality can be achieved in a similar fashion below:
std::vector<Base*> vector;
vector.push_back(&square);
vector.push_back(&triangle);
for (auto& e : vector)
{
auto test = dynamic_cast<Triangle*>(e); // I only care about triangles
if (!test) // not a triangle
e.GenericFunction();
else
e.TriangleOnlyMagic();
}
So say if only Triangles had a Rotate function, it would be a compiler error if you tried to call it on all objects of the class. Using dynamic_cast, you can simulate the as keyword. To be clear, if a cast fails, it returns an invalid pointer. So !test is essentially a shorthand for checking if test is NULL or an invalid pointer, which means the cast failed.
Benefits of automatic variables
After seeing all the great things dynamic allocation can do, you're probably wondering why wouldn't anyone NOT use dynamic allocation all the time? I already told you one reason, the heap is slow. And if you don't need all that memory, you shouldn't abuse it. So here are some disadvantages in no particular order:
It is error-prone. Manual memory allocation is dangerous and you are prone to leaks. If you are not proficient at using the debugger or valgrind (a memory leak tool), you may pull your hair out of your head. Luckily RAII idioms and smart pointers alleviate this a bit, but you must be familiar with practices such as The Rule Of Three and The Rule Of Five. It is a lot of information to take in, and beginners who either don't know or don't care will fall into this trap.
It is not necessary. Unlike Java and C# where it is idiomatic to use the new keyword everywhere, in C++, you should only use it if you need to. The common phrase goes, everything looks like a nail if you have a hammer. Whereas beginners who start with C++ are scared of pointers and learn to use stack variables by habit, Java and C# programmers start by using pointers without understanding it! That is literally stepping off on the wrong foot. You must abandon everything you know because the syntax is one thing, learning the language is another.
1. (N)RVO - Aka, (Named) Return Value Optimization
One optimization many compilers make are things called elision and return value optimization. These things can obviate unnecessary copys which is useful for objects that are very large, such as a vector containing many elements. Normally the common practice is to use pointers to transfer ownership rather than copying the large objects to move them around. This has lead to the inception of move semantics and smart pointers.
If you are using pointers, (N)RVO does NOT occur. It is more beneficial and less error-prone to take advantage of (N)RVO rather than returning or passing pointers if you are worried about optimization. Error leaks can happen if the caller of a function is responsible for deleteing a dynamically allocated object and such. It can be difficult to track the ownership of an object if pointers are being passed around like a hot potato. Just use stack variables because it is simpler and better.
Another good reason to use pointers would be for forward declarations. In a large enough project they can really speed up compile time.
In C++, objects allocated on the stack (using Object object; statement within a block) will only live within the scope they are declared in. When the block of code finishes execution, the object declared are destroyed.
Whereas if you allocate memory on heap, using Object* obj = new Object(), they continue to live in heap until you call delete obj.
I would create an object on heap when I like to use the object not only in the block of code which declared/allocated it.
C++ gives you three ways to pass an object: by pointer, by reference, and by value. Java limits you with the latter one (the only exception is primitive types like int, boolean etc). If you want to use C++ not just like a weird toy, then you'd better get to know the difference between these three ways.
Java pretends that there is no such problem as 'who and when should destroy this?'. The answer is: The Garbage Collector, Great and Awful. Nevertheless, it can't provide 100% protection against memory leaks (yes, java can leak memory). Actually, GC gives you a false sense of safety. The bigger your SUV, the longer your way to the evacuator.
C++ leaves you face-to-face with object's lifecycle management. Well, there are means to deal with that (smart pointers family, QObject in Qt and so on), but none of them can be used in 'fire and forget' manner like GC: you should always keep in mind memory handling. Not only should you care about destroying an object, you also have to avoid destroying the same object more than once.
Not scared yet? Ok: cyclic references - handle them yourself, human. And remember: kill each object precisely once, we C++ runtimes don't like those who mess with corpses, leave dead ones alone.
So, back to your question.
When you pass your object around by value, not by pointer or by reference, you copy the object (the whole object, whether it's a couple of bytes or a huge database dump - you're smart enough to care to avoid latter, aren't you?) every time you do '='. And to access the object's members, you use '.' (dot).
When you pass your object by pointer, you copy just a few bytes (4 on 32-bit systems, 8 on 64-bit ones), namely - the address of this object. And to show this to everyone, you use this fancy '->' operator when you access the members. Or you can use the combination of '*' and '.'.
When you use references, then you get the pointer that pretends to be a value. It's a pointer, but you access the members through '.'.
And, to blow your mind one more time: when you declare several variables separated by commas, then (watch the hands):
Type is given to everyone
Value/pointer/reference modifier is individual
Example:
struct MyStruct
{
int* someIntPointer, someInt; //here comes the surprise
MyStruct *somePointer;
MyStruct &someReference;
};
MyStruct s1; //we allocated an object on stack, not in heap
s1.someInt = 1; //someInt is of type 'int', not 'int*' - value/pointer modifier is individual
s1.someIntPointer = &s1.someInt;
*s1.someIntPointer = 2; //now s1.someInt has value '2'
s1.somePointer = &s1;
s1.someReference = s1; //note there is no '&' operator: reference tries to look like value
s1.somePointer->someInt = 3; //now s1.someInt has value '3'
*(s1.somePointer).someInt = 3; //same as above line
*s1.somePointer->someIntPointer = 4; //now s1.someInt has value '4'
s1.someReference.someInt = 5; //now s1.someInt has value '5'
//although someReference is not value, it's members are accessed through '.'
MyStruct s2 = s1; //'NO WAY' the compiler will say. Go define your '=' operator and come back.
//OK, assume we have '=' defined in MyStruct
s2.someInt = 0; //s2.someInt == 0, but s1.someInt is still 5 - it's two completely different objects, not the references to the same one
But I can't figure out why should we use it like this?
I will compare how it works inside the function body if you use:
Object myObject;
Inside the function, your myObject will get destroyed once this function returns. So this is useful if you don't need your object outside your function. This object will be put on current thread stack.
If you write inside function body:
Object *myObject = new Object;
then Object class instance pointed by myObject will not get destroyed once the function ends, and allocation is on the heap.
Now if you are Java programmer, then the second example is closer to how object allocation works under java. This line: Object *myObject = new Object; is equivalent to java: Object myObject = new Object();. The difference is that under java myObject will get garbage collected, while under c++ it will not get freed, you must somewhere explicitly call `delete myObject;' otherwise you will introduce memory leaks.
Since c++11 you can use safe ways of dynamic allocations: new Object, by storing values in shared_ptr/unique_ptr.
std::shared_ptr<std::string> safe_str = make_shared<std::string>("make_shared");
// since c++14
std::unique_ptr<std::string> safe_str = make_unique<std::string>("make_shared");
also, objects are very often stored in containers, like map-s or vector-s, they will automatically manage a lifetime of your objects.
Technically it is a memory allocation issue, however here are two more practical aspects of this.
It has to do with two things:
1) Scope, when you define an object without a pointer you will no longer be able to access it after the code block it is defined in, whereas if you define a pointer with "new" then you can access it from anywhere you have a pointer to this memory until you call "delete" on the same pointer.
2) If you want to pass arguments to a function you want to pass a pointer or a reference in order to be more efficient. When you pass an Object then the object is copied, if this is an object that uses a lot of memory this might be CPU consuming (e.g. you copy a vector full of data). When you pass a pointer all you pass is one int (depending of implementation but most of them are one int).
Other than that you need to understand that "new" allocates memory on the heap that needs to be freed at some point. When you don't have to use "new" I suggest you use a regular object definition "on the stack".
Well the main question is Why should I use a pointer rather than the object itself? And my answer, you should (almost) never use pointer instead of object, because C++ has references, it is safer then pointers and guarantees the same performance as pointers.
Another thing you mentioned in your question:
Object *myObject = new Object;
How does it work? It creates pointer of Object type, allocates memory to fit one object and calls default constructor, sounds good, right? But actually it isn't so good, if you dynamically allocated memory (used keyword new), you also have to free memory manually, that means in code you should have:
delete myObject;
This calls destructor and frees memory, looks easy, however in big projects may be difficult to detect if one thread freed memory or not, but for that purpose you can try shared pointers, these slightly decreases performance, but it is much easier to work with them.
And now some introduction is over and go back to question.
You can use pointers instead of objects to get better performance while transferring data between function.
Take a look, you have std::string (it is also object) and it contains really much data, for example big XML, now you need to parse it, but for that you have function void foo(...) which can be declarated in different ways:
void foo(std::string xml);
In this case you will copy all data from your variable to function stack, it takes some time, so your performance will be low.
void foo(std::string* xml);
In this case you will pass pointer to object, same speed as passing size_t variable, however this declaration has error prone, because you can pass NULL pointer or invalid pointer. Pointers usually used in C because it doesn't have references.
void foo(std::string& xml);
Here you pass reference, basically it is the same as passing pointer, but compiler does some stuff and you cannot pass invalid reference (actually it is possible to create situation with invalid reference, but it is tricking compiler).
void foo(const std::string* xml);
Here is the same as second, just pointer value cannot be changed.
void foo(const std::string& xml);
Here is the same as third, but object value cannot be changed.
What more I want to mention, you can use these 5 ways to pass data no matter which allocation way you have chosen (with new or regular).
Another thing to mention, when you create object in regular way, you allocate memory in stack, but while you create it with new you allocate heap. It is much faster to allocate stack, but it is kind a small for really big arrays of data, so if you need big object you should use heap, because you may get stack overflow, but usually this issue is solved using STL containers and remember std::string is also container, some guys forgot it :)
Let's say that you have class A that contain class B When you want to call some function of class B outside class A you will simply obtain a pointer to this class and you can do whatever you want and it will also change context of class B in your class A
But be careful with dynamic object
There are many benefits of using pointers to object -
Efficiency (as you already pointed out). Passing objects to
functions mean creating new copies of object.
Working with objects from third party libraries. If your object
belongs to a third party code and the authors intend the usage of their objects through pointers only (no copy constructors etc) the only way you can pass around this
object is using pointers. Passing by value may cause issues. (Deep
copy / shallow copy issues).
if the object owns a resource and you want that the ownership should not be sahred with other objects.
This is has been discussed at length, but in Java everything is a pointer. It makes no distinction between stack and heap allocations (all objects are allocated on the heap), so you don't realize you're using pointers. In C++, you can mix the two, depending on your memory requirements. Performance and memory usage is more deterministic in C++ (duh).
Object *myObject = new Object;
Doing this will create a reference to an Object (on the heap) which has to be deleted explicitly to avoid memory leak.
Object myObject;
Doing this will create an object(myObject) of the automatic type (on the stack) that will be automatically deleted when the object(myObject) goes out of scope.
A pointer directly references the memory location of an object. Java has nothing like this. Java has references that reference the location of object through hash tables. You cannot do anything like pointer arithmetic in Java with these references.
To answer your question, it's just your preference. I prefer using the Java-like syntax.
The key strength of object pointers in C++ is allowing for polymorphic arrays and maps of pointers of the same superclass. It allows, for example, to put parakeets, chickens, robins, ostriches, etc. in an array of Bird.
Additionally, dynamically allocated objects are more flexible, and can use HEAP memory whereas a locally allocated object will use the STACK memory unless it is static. Having large objects on the stack, especially when using recursion, will undoubtedly lead to stack overflow.
One reason for using pointers is to interface with C functions. Another reason is to save memory; for example: instead of passing an object which contains a lot of data and has a processor-intensive copy-constructor to a function, just pass a pointer to the object, saving memory and speed especially if you're in a loop, however a reference would be better in that case, unless you're using an C-style array.
In areas where memory utilization is at its premium , pointers comes handy. For example consider a minimax algorithm, where thousands of nodes will be generated using recursive routine, and later use them to evaluate the next best move in game, ability to deallocate or reset (as in smart pointers) significantly reduces memory consumption. Whereas the non-pointer variable continues to occupy space till it's recursive call returns a value.
I will include one important use case of pointer. When you are storing some object in the base class, but it could be polymorphic.
Class Base1 {
};
Class Derived1 : public Base1 {
};
Class Base2 {
Base *bObj;
virtual void createMemerObects() = 0;
};
Class Derived2 {
virtual void createMemerObects() {
bObj = new Derived1();
}
};
So in this case you can't declare bObj as an direct object, you have to have pointer.
tl;dr: Don't "use a pointer rather than the object itself" (usually)
You asked why you should prefer a pointer rather than the object itself. Well, you shouldn't, as a general rule.
Now, there are indeed multiple exceptions to this rule, and other answers have spelled them out. The thing is, these days, many of these exceptions are no longer valid! Let us consider the exceptions listed in the accepted answer:
You need reference semantics.
If you need reference semantics, use references, not pointers; see #ST3's answer's answer. In fact, one could argue that, in Java, what you pass around are usually references.
You need polymorphism.
If you know the set of classes you'll be working with, very often you can just use an std::variant<ClassA, ClassB, ClassC> (see description here) and operate on them using a visitor pattern. Now, granted, C++'s variant implementation is not the prettiest sight; but I'd usually prefer it over getting down-and-dirty with pointers.
You want to represent that an object is optional
Absolutely don't use pointers for that. You have std::optional, and unlike std::variant, it's quite convenient. Use that instead. nullopt is an empty (or "null") optional. And - it's not a pointer.
You want to decouple compilation units to improve compilation time.
You can use references rather than pointers to achieve this as well. To use Object& in a piece of code, it's sufficient to say class Object;, i.e. to use a forward-declaration.
You need to interface with a C library or a C-style library.
Yeah, well, if you work with code that already uses pointers, then - you have to use pointers yourself, can't get around that :-( and C doesn't have references.
Also, some people may tell you to use pointers to avoid making copies of objects. Well this is not really a problem for return values, due to the return-value and named-return-value optimizations (RVO and NRVO). And in other cases - references avoid copying just fine.
The bottom-line rule is still the same as the accepted answer, though: Only use a pointer when you have a good reason to need one.
PS - If you do need a pointer, you should still avoid using new and delete directly. You will probably be better served by a smart pointer - which is automagically freed (not like in Java, but still).
With pointers ,
can directly talk to the memory.
can prevent lot of memory leaks of a program by manipulating pointers.
"Necessity is the mother of invention."
The most of important difference that I would like to point out is the outcome of my own experience of coding.
Sometimes you need to pass objects to functions. In that case, if your object is of a very big class then passing it as an object will copy its state (which you might not want ..AND CAN BE BIG OVERHEAD) thus resulting in an overhead of copying object .while pointer is fixed 4-byte size (assuming 32 bit). Other reasons are already mentioned above...
There are many excellent answers already, but let me give you one example:
I have an simple Item class:
class Item
{
public:
std::string name;
int weight;
int price;
};
I make a vector to hold a bunch of them.
std::vector<Item> inventory;
I create one million Item objects, and push them back onto the vector. I sort the vector by name, and then do a simple iterative binary search for a particular item name. I test the program, and it takes over 8 minutes to finish executing. Then I change my inventory vector like so:
std::vector<Item *> inventory;
...and create my million Item objects via new. The ONLY changes I make to my code are to use the pointers to Items, excepting a loop I add for memory cleanup at the end. That program runs in under 40 seconds, or better than a 10x speed increase.
EDIT: The code is at http://pastebin.com/DK24SPeW
With compiler optimizations it shows only a 3.4x increase on the machine I just tested it on, which is still considerable.

Why should I use a pointer rather than the object itself?

I'm coming from a Java background and have started working with objects in C++. But one thing that occurred to me is that people often use pointers to objects rather than the objects themselves, for example this declaration:
Object *myObject = new Object;
rather than:
Object myObject;
Or instead of using a function, let's say testFunc(), like this:
myObject.testFunc();
we have to write:
myObject->testFunc();
But I can't figure out why should we do it this way. I would assume it has to do with efficiency and speed since we get direct access to the memory address. Am I right?
It's very unfortunate that you see dynamic allocation so often. That just shows how many bad C++ programmers there are.
In a sense, you have two questions bundled up into one. The first is when should we use dynamic allocation (using new)? The second is when should we use pointers?
The important take-home message is that you should always use the appropriate tool for the job. In almost all situations, there is something more appropriate and safer than performing manual dynamic allocation and/or using raw pointers.
Dynamic allocation
In your question, you've demonstrated two ways of creating an object. The main difference is the storage duration of the object. When doing Object myObject; within a block, the object is created with automatic storage duration, which means it will be destroyed automatically when it goes out of scope. When you do new Object(), the object has dynamic storage duration, which means it stays alive until you explicitly delete it. You should only use dynamic storage duration when you need it.
That is, you should always prefer creating objects with automatic storage duration when you can.
The main two situations in which you might require dynamic allocation:
You need the object to outlive the current scope - that specific object at that specific memory location, not a copy of it. If you're okay with copying/moving the object (most of the time you should be), you should prefer an automatic object.
You need to allocate a lot of memory, which may easily fill up the stack. It would be nice if we didn't have to concern ourselves with this (most of the time you shouldn't have to), as it's really outside the purview of C++, but unfortunately, we have to deal with the reality of the systems we're developing for.
When you do absolutely require dynamic allocation, you should encapsulate it in a smart pointer or some other type that performs RAII (like the standard containers). Smart pointers provide ownership semantics of dynamically allocated objects. Take a look at std::unique_ptr and std::shared_ptr, for example. If you use them appropriately, you can almost entirely avoid performing your own memory management (see the Rule of Zero).
Pointers
However, there are other more general uses for raw pointers beyond dynamic allocation, but most have alternatives that you should prefer. As before, always prefer the alternatives unless you really need pointers.
You need reference semantics. Sometimes you want to pass an object using a pointer (regardless of how it was allocated) because you want the function to which you're passing it to have access that that specific object (not a copy of it). However, in most situations, you should prefer reference types to pointers, because this is specifically what they're designed for. Note this is not necessarily about extending the lifetime of the object beyond the current scope, as in situation 1 above. As before, if you're okay with passing a copy of the object, you don't need reference semantics.
You need polymorphism. You can only call functions polymorphically (that is, according to the dynamic type of an object) through a pointer or reference to the object. If that's the behavior you need, then you need to use pointers or references. Again, references should be preferred.
You want to represent that an object is optional by allowing a nullptr to be passed when the object is being omitted. If it's an argument, you should prefer to use default arguments or function overloads. Otherwise, you should preferably use a type that encapsulates this behavior, such as std::optional (introduced in C++17 - with earlier C++ standards, use boost::optional).
You want to decouple compilation units to improve compilation time. The useful property of a pointer is that you only require a forward declaration of the pointed-to type (to actually use the object, you'll need a definition). This allows you to decouple parts of your compilation process, which may significantly improve compilation time. See the Pimpl idiom.
You need to interface with a C library or a C-style library. At this point, you're forced to use raw pointers. The best thing you can do is make sure you only let your raw pointers loose at the last possible moment. You can get a raw pointer from a smart pointer, for example, by using its get member function. If a library performs some allocation for you which it expects you to deallocate via a handle, you can often wrap the handle up in a smart pointer with a custom deleter that will deallocate the object appropriately.
There are many use cases for pointers.
Polymorphic behavior. For polymorphic types, pointers (or references) are used to avoid slicing:
class Base { ... };
class Derived : public Base { ... };
void fun(Base b) { ... }
void gun(Base* b) { ... }
void hun(Base& b) { ... }
Derived d;
fun(d); // oops, all Derived parts silently "sliced" off
gun(&d); // OK, a Derived object IS-A Base object
hun(d); // also OK, reference also doesn't slice
Reference semantics and avoiding copying. For non-polymorphic types, a pointer (or a reference) will avoid copying a potentially expensive object
Base b;
fun(b); // copies b, potentially expensive
gun(&b); // takes a pointer to b, no copying
hun(b); // regular syntax, behaves as a pointer
Note that C++11 has move semantics that can avoid many copies of expensive objects into function argument and as return values. But using a pointer will definitely avoid those and will allow multiple pointers on the same object (whereas an object can only be moved from once).
Resource acquisition. Creating a pointer to a resource using the new operator is an anti-pattern in modern C++. Use a special resource class (one of the Standard containers) or a smart pointer (std::unique_ptr<> or std::shared_ptr<>). Consider:
{
auto b = new Base;
... // oops, if an exception is thrown, destructor not called!
delete b;
}
vs.
{
auto b = std::make_unique<Base>();
... // OK, now exception safe
}
A raw pointer should only be used as a "view" and not in any way involved in ownership, be it through direct creation or implicitly through return values. See also this Q&A from the C++ FAQ.
More fine-grained life-time control Every time a shared pointer is being copied (e.g. as a function argument) the resource it points to is being kept alive. Regular objects (not created by new, either directly by you or inside a resource class) are destroyed when going out of scope.
There are many excellent answers to this question, including the important use cases of forward declarations, polymorphism etc. but I feel a part of the "soul" of your question is not answered - namely what the different syntaxes mean across Java and C++.
Let's examine the situation comparing the two languages:
Java:
Object object1 = new Object(); //A new object is allocated by Java
Object object2 = new Object(); //Another new object is allocated by Java
object1 = object2;
//object1 now points to the object originally allocated for object2
//The object originally allocated for object1 is now "dead" - nothing points to it, so it
//will be reclaimed by the Garbage Collector.
//If either object1 or object2 is changed, the change will be reflected to the other
The closest equivalent to this, is:
C++:
Object * object1 = new Object(); //A new object is allocated on the heap
Object * object2 = new Object(); //Another new object is allocated on the heap
delete object1;
//Since C++ does not have a garbage collector, if we don't do that, the next line would
//cause a "memory leak", i.e. a piece of claimed memory that the app cannot use
//and that we have no way to reclaim...
object1 = object2; //Same as Java, object1 points to object2.
Let's see the alternative C++ way:
Object object1; //A new object is allocated on the STACK
Object object2; //Another new object is allocated on the STACK
object1 = object2;//!!!! This is different! The CONTENTS of object2 are COPIED onto object1,
//using the "copy assignment operator", the definition of operator =.
//But, the two objects are still different. Change one, the other remains unchanged.
//Also, the objects get automatically destroyed once the function returns...
The best way to think of it is that -- more or less -- Java (implicitly) handles pointers to objects, while C++ may handle either pointers to objects, or the objects themselves.
There are exceptions to this -- for example, if you declare Java "primitive" types, they are actual values that are copied, and not pointers.
So,
Java:
int object1; //An integer is allocated on the stack.
int object2; //Another integer is allocated on the stack.
object1 = object2; //The value of object2 is copied to object1.
That said, using pointers is NOT necessarily either the correct or the wrong way to handle things; however other answers have covered that satisfactorily. The general idea though is that in C++ you have much more control on the lifetime of the objects, and on where they will live.
Take home point -- the Object * object = new Object() construct is actually what is closest to typical Java (or C# for that matter) semantics.
Preface
Java is nothing like C++, contrary to hype. The Java hype machine would like you to believe that because Java has C++ like syntax, that the languages are similar. Nothing can be further from the truth. This misinformation is part of the reason why Java programmers go to C++ and use Java-like syntax without understanding the implications of their code.
Onwards we go
But I can't figure out why should we do it this way. I would assume it
has to do with efficiency and speed since we get direct access to the
memory address. Am I right?
To the contrary, actually. The heap is much slower than the stack, because the stack is very simple compared to the heap. Automatic storage variables (aka stack variables) have their destructors called once they go out of scope. For example:
{
std::string s;
}
// s is destroyed here
On the other hand, if you use a pointer dynamically allocated, its destructor must be called manually. delete calls this destructor for you.
{
std::string* s = new std::string;
delete s; // destructor called
}
This has nothing to do with the new syntax prevalent in C# and Java. They are used for completely different purposes.
Benefits of dynamic allocation
1. You don't have to know the size of the array in advance
One of the first problems many C++ programmers run into is that when they are accepting arbitrary input from users, you can only allocate a fixed size for a stack variable. You cannot change the size of arrays either. For example:
char buffer[100];
std::cin >> buffer;
// bad input = buffer overflow
Of course, if you used an std::string instead, std::string internally resizes itself so that shouldn't be a problem. But essentially the solution to this problem is dynamic allocation. You can allocate dynamic memory based on the input of the user, for example:
int * pointer;
std::cout << "How many items do you need?";
std::cin >> n;
pointer = new int[n];
Side note: One mistake many beginners make is the usage of
variable length arrays. This is a GNU extension and also one in Clang
because they mirror many of GCC's extensions. So the following
int arr[n] should not be relied on.
Because the heap is much bigger than the stack, one can arbitrarily allocate/reallocate as much memory as he/she needs, whereas the stack has a limitation.
2. Arrays are not pointers
How is this a benefit you ask? The answer will become clear once you understand the confusion/myth behind arrays and pointers. It is commonly assumed that they are the same, but they are not. This myth comes from the fact that pointers can be subscripted just like arrays and because of arrays decay to pointers at the top level in a function declaration. However, once an array decays to a pointer, the pointer loses its sizeof information. So sizeof(pointer) will give the size of the pointer in bytes, which is usually 8 bytes on a 64-bit system.
You cannot assign to arrays, only initialize them. For example:
int arr[5] = {1, 2, 3, 4, 5}; // initialization
int arr[] = {1, 2, 3, 4, 5}; // The standard dictates that the size of the array
// be given by the amount of members in the initializer
arr = { 1, 2, 3, 4, 5 }; // ERROR
On the other hand, you can do whatever you want with pointers. Unfortunately, because the distinction between pointers and arrays are hand-waved in Java and C#, beginners don't understand the difference.
3. Polymorphism
Java and C# have facilities that allow you to treat objects as another, for example using the as keyword. So if somebody wanted to treat an Entity object as a Player object, one could do Player player = Entity as Player; This is very useful if you intend to call functions on a homogeneous container that should only apply to a specific type. The functionality can be achieved in a similar fashion below:
std::vector<Base*> vector;
vector.push_back(&square);
vector.push_back(&triangle);
for (auto& e : vector)
{
auto test = dynamic_cast<Triangle*>(e); // I only care about triangles
if (!test) // not a triangle
e.GenericFunction();
else
e.TriangleOnlyMagic();
}
So say if only Triangles had a Rotate function, it would be a compiler error if you tried to call it on all objects of the class. Using dynamic_cast, you can simulate the as keyword. To be clear, if a cast fails, it returns an invalid pointer. So !test is essentially a shorthand for checking if test is NULL or an invalid pointer, which means the cast failed.
Benefits of automatic variables
After seeing all the great things dynamic allocation can do, you're probably wondering why wouldn't anyone NOT use dynamic allocation all the time? I already told you one reason, the heap is slow. And if you don't need all that memory, you shouldn't abuse it. So here are some disadvantages in no particular order:
It is error-prone. Manual memory allocation is dangerous and you are prone to leaks. If you are not proficient at using the debugger or valgrind (a memory leak tool), you may pull your hair out of your head. Luckily RAII idioms and smart pointers alleviate this a bit, but you must be familiar with practices such as The Rule Of Three and The Rule Of Five. It is a lot of information to take in, and beginners who either don't know or don't care will fall into this trap.
It is not necessary. Unlike Java and C# where it is idiomatic to use the new keyword everywhere, in C++, you should only use it if you need to. The common phrase goes, everything looks like a nail if you have a hammer. Whereas beginners who start with C++ are scared of pointers and learn to use stack variables by habit, Java and C# programmers start by using pointers without understanding it! That is literally stepping off on the wrong foot. You must abandon everything you know because the syntax is one thing, learning the language is another.
1. (N)RVO - Aka, (Named) Return Value Optimization
One optimization many compilers make are things called elision and return value optimization. These things can obviate unnecessary copys which is useful for objects that are very large, such as a vector containing many elements. Normally the common practice is to use pointers to transfer ownership rather than copying the large objects to move them around. This has lead to the inception of move semantics and smart pointers.
If you are using pointers, (N)RVO does NOT occur. It is more beneficial and less error-prone to take advantage of (N)RVO rather than returning or passing pointers if you are worried about optimization. Error leaks can happen if the caller of a function is responsible for deleteing a dynamically allocated object and such. It can be difficult to track the ownership of an object if pointers are being passed around like a hot potato. Just use stack variables because it is simpler and better.
Another good reason to use pointers would be for forward declarations. In a large enough project they can really speed up compile time.
In C++, objects allocated on the stack (using Object object; statement within a block) will only live within the scope they are declared in. When the block of code finishes execution, the object declared are destroyed.
Whereas if you allocate memory on heap, using Object* obj = new Object(), they continue to live in heap until you call delete obj.
I would create an object on heap when I like to use the object not only in the block of code which declared/allocated it.
C++ gives you three ways to pass an object: by pointer, by reference, and by value. Java limits you with the latter one (the only exception is primitive types like int, boolean etc). If you want to use C++ not just like a weird toy, then you'd better get to know the difference between these three ways.
Java pretends that there is no such problem as 'who and when should destroy this?'. The answer is: The Garbage Collector, Great and Awful. Nevertheless, it can't provide 100% protection against memory leaks (yes, java can leak memory). Actually, GC gives you a false sense of safety. The bigger your SUV, the longer your way to the evacuator.
C++ leaves you face-to-face with object's lifecycle management. Well, there are means to deal with that (smart pointers family, QObject in Qt and so on), but none of them can be used in 'fire and forget' manner like GC: you should always keep in mind memory handling. Not only should you care about destroying an object, you also have to avoid destroying the same object more than once.
Not scared yet? Ok: cyclic references - handle them yourself, human. And remember: kill each object precisely once, we C++ runtimes don't like those who mess with corpses, leave dead ones alone.
So, back to your question.
When you pass your object around by value, not by pointer or by reference, you copy the object (the whole object, whether it's a couple of bytes or a huge database dump - you're smart enough to care to avoid latter, aren't you?) every time you do '='. And to access the object's members, you use '.' (dot).
When you pass your object by pointer, you copy just a few bytes (4 on 32-bit systems, 8 on 64-bit ones), namely - the address of this object. And to show this to everyone, you use this fancy '->' operator when you access the members. Or you can use the combination of '*' and '.'.
When you use references, then you get the pointer that pretends to be a value. It's a pointer, but you access the members through '.'.
And, to blow your mind one more time: when you declare several variables separated by commas, then (watch the hands):
Type is given to everyone
Value/pointer/reference modifier is individual
Example:
struct MyStruct
{
int* someIntPointer, someInt; //here comes the surprise
MyStruct *somePointer;
MyStruct &someReference;
};
MyStruct s1; //we allocated an object on stack, not in heap
s1.someInt = 1; //someInt is of type 'int', not 'int*' - value/pointer modifier is individual
s1.someIntPointer = &s1.someInt;
*s1.someIntPointer = 2; //now s1.someInt has value '2'
s1.somePointer = &s1;
s1.someReference = s1; //note there is no '&' operator: reference tries to look like value
s1.somePointer->someInt = 3; //now s1.someInt has value '3'
*(s1.somePointer).someInt = 3; //same as above line
*s1.somePointer->someIntPointer = 4; //now s1.someInt has value '4'
s1.someReference.someInt = 5; //now s1.someInt has value '5'
//although someReference is not value, it's members are accessed through '.'
MyStruct s2 = s1; //'NO WAY' the compiler will say. Go define your '=' operator and come back.
//OK, assume we have '=' defined in MyStruct
s2.someInt = 0; //s2.someInt == 0, but s1.someInt is still 5 - it's two completely different objects, not the references to the same one
But I can't figure out why should we use it like this?
I will compare how it works inside the function body if you use:
Object myObject;
Inside the function, your myObject will get destroyed once this function returns. So this is useful if you don't need your object outside your function. This object will be put on current thread stack.
If you write inside function body:
Object *myObject = new Object;
then Object class instance pointed by myObject will not get destroyed once the function ends, and allocation is on the heap.
Now if you are Java programmer, then the second example is closer to how object allocation works under java. This line: Object *myObject = new Object; is equivalent to java: Object myObject = new Object();. The difference is that under java myObject will get garbage collected, while under c++ it will not get freed, you must somewhere explicitly call `delete myObject;' otherwise you will introduce memory leaks.
Since c++11 you can use safe ways of dynamic allocations: new Object, by storing values in shared_ptr/unique_ptr.
std::shared_ptr<std::string> safe_str = make_shared<std::string>("make_shared");
// since c++14
std::unique_ptr<std::string> safe_str = make_unique<std::string>("make_shared");
also, objects are very often stored in containers, like map-s or vector-s, they will automatically manage a lifetime of your objects.
Technically it is a memory allocation issue, however here are two more practical aspects of this.
It has to do with two things:
1) Scope, when you define an object without a pointer you will no longer be able to access it after the code block it is defined in, whereas if you define a pointer with "new" then you can access it from anywhere you have a pointer to this memory until you call "delete" on the same pointer.
2) If you want to pass arguments to a function you want to pass a pointer or a reference in order to be more efficient. When you pass an Object then the object is copied, if this is an object that uses a lot of memory this might be CPU consuming (e.g. you copy a vector full of data). When you pass a pointer all you pass is one int (depending of implementation but most of them are one int).
Other than that you need to understand that "new" allocates memory on the heap that needs to be freed at some point. When you don't have to use "new" I suggest you use a regular object definition "on the stack".
Well the main question is Why should I use a pointer rather than the object itself? And my answer, you should (almost) never use pointer instead of object, because C++ has references, it is safer then pointers and guarantees the same performance as pointers.
Another thing you mentioned in your question:
Object *myObject = new Object;
How does it work? It creates pointer of Object type, allocates memory to fit one object and calls default constructor, sounds good, right? But actually it isn't so good, if you dynamically allocated memory (used keyword new), you also have to free memory manually, that means in code you should have:
delete myObject;
This calls destructor and frees memory, looks easy, however in big projects may be difficult to detect if one thread freed memory or not, but for that purpose you can try shared pointers, these slightly decreases performance, but it is much easier to work with them.
And now some introduction is over and go back to question.
You can use pointers instead of objects to get better performance while transferring data between function.
Take a look, you have std::string (it is also object) and it contains really much data, for example big XML, now you need to parse it, but for that you have function void foo(...) which can be declarated in different ways:
void foo(std::string xml);
In this case you will copy all data from your variable to function stack, it takes some time, so your performance will be low.
void foo(std::string* xml);
In this case you will pass pointer to object, same speed as passing size_t variable, however this declaration has error prone, because you can pass NULL pointer or invalid pointer. Pointers usually used in C because it doesn't have references.
void foo(std::string& xml);
Here you pass reference, basically it is the same as passing pointer, but compiler does some stuff and you cannot pass invalid reference (actually it is possible to create situation with invalid reference, but it is tricking compiler).
void foo(const std::string* xml);
Here is the same as second, just pointer value cannot be changed.
void foo(const std::string& xml);
Here is the same as third, but object value cannot be changed.
What more I want to mention, you can use these 5 ways to pass data no matter which allocation way you have chosen (with new or regular).
Another thing to mention, when you create object in regular way, you allocate memory in stack, but while you create it with new you allocate heap. It is much faster to allocate stack, but it is kind a small for really big arrays of data, so if you need big object you should use heap, because you may get stack overflow, but usually this issue is solved using STL containers and remember std::string is also container, some guys forgot it :)
Let's say that you have class A that contain class B When you want to call some function of class B outside class A you will simply obtain a pointer to this class and you can do whatever you want and it will also change context of class B in your class A
But be careful with dynamic object
There are many benefits of using pointers to object -
Efficiency (as you already pointed out). Passing objects to
functions mean creating new copies of object.
Working with objects from third party libraries. If your object
belongs to a third party code and the authors intend the usage of their objects through pointers only (no copy constructors etc) the only way you can pass around this
object is using pointers. Passing by value may cause issues. (Deep
copy / shallow copy issues).
if the object owns a resource and you want that the ownership should not be sahred with other objects.
This is has been discussed at length, but in Java everything is a pointer. It makes no distinction between stack and heap allocations (all objects are allocated on the heap), so you don't realize you're using pointers. In C++, you can mix the two, depending on your memory requirements. Performance and memory usage is more deterministic in C++ (duh).
Object *myObject = new Object;
Doing this will create a reference to an Object (on the heap) which has to be deleted explicitly to avoid memory leak.
Object myObject;
Doing this will create an object(myObject) of the automatic type (on the stack) that will be automatically deleted when the object(myObject) goes out of scope.
A pointer directly references the memory location of an object. Java has nothing like this. Java has references that reference the location of object through hash tables. You cannot do anything like pointer arithmetic in Java with these references.
To answer your question, it's just your preference. I prefer using the Java-like syntax.
The key strength of object pointers in C++ is allowing for polymorphic arrays and maps of pointers of the same superclass. It allows, for example, to put parakeets, chickens, robins, ostriches, etc. in an array of Bird.
Additionally, dynamically allocated objects are more flexible, and can use HEAP memory whereas a locally allocated object will use the STACK memory unless it is static. Having large objects on the stack, especially when using recursion, will undoubtedly lead to stack overflow.
One reason for using pointers is to interface with C functions. Another reason is to save memory; for example: instead of passing an object which contains a lot of data and has a processor-intensive copy-constructor to a function, just pass a pointer to the object, saving memory and speed especially if you're in a loop, however a reference would be better in that case, unless you're using an C-style array.
In areas where memory utilization is at its premium , pointers comes handy. For example consider a minimax algorithm, where thousands of nodes will be generated using recursive routine, and later use them to evaluate the next best move in game, ability to deallocate or reset (as in smart pointers) significantly reduces memory consumption. Whereas the non-pointer variable continues to occupy space till it's recursive call returns a value.
I will include one important use case of pointer. When you are storing some object in the base class, but it could be polymorphic.
Class Base1 {
};
Class Derived1 : public Base1 {
};
Class Base2 {
Base *bObj;
virtual void createMemerObects() = 0;
};
Class Derived2 {
virtual void createMemerObects() {
bObj = new Derived1();
}
};
So in this case you can't declare bObj as an direct object, you have to have pointer.
tl;dr: Don't "use a pointer rather than the object itself" (usually)
You asked why you should prefer a pointer rather than the object itself. Well, you shouldn't, as a general rule.
Now, there are indeed multiple exceptions to this rule, and other answers have spelled them out. The thing is, these days, many of these exceptions are no longer valid! Let us consider the exceptions listed in the accepted answer:
You need reference semantics.
If you need reference semantics, use references, not pointers; see #ST3's answer's answer. In fact, one could argue that, in Java, what you pass around are usually references.
You need polymorphism.
If you know the set of classes you'll be working with, very often you can just use an std::variant<ClassA, ClassB, ClassC> (see description here) and operate on them using a visitor pattern. Now, granted, C++'s variant implementation is not the prettiest sight; but I'd usually prefer it over getting down-and-dirty with pointers.
You want to represent that an object is optional
Absolutely don't use pointers for that. You have std::optional, and unlike std::variant, it's quite convenient. Use that instead. nullopt is an empty (or "null") optional. And - it's not a pointer.
You want to decouple compilation units to improve compilation time.
You can use references rather than pointers to achieve this as well. To use Object& in a piece of code, it's sufficient to say class Object;, i.e. to use a forward-declaration.
You need to interface with a C library or a C-style library.
Yeah, well, if you work with code that already uses pointers, then - you have to use pointers yourself, can't get around that :-( and C doesn't have references.
Also, some people may tell you to use pointers to avoid making copies of objects. Well this is not really a problem for return values, due to the return-value and named-return-value optimizations (RVO and NRVO). And in other cases - references avoid copying just fine.
The bottom-line rule is still the same as the accepted answer, though: Only use a pointer when you have a good reason to need one.
PS - If you do need a pointer, you should still avoid using new and delete directly. You will probably be better served by a smart pointer - which is automagically freed (not like in Java, but still).
With pointers ,
can directly talk to the memory.
can prevent lot of memory leaks of a program by manipulating pointers.
"Necessity is the mother of invention."
The most of important difference that I would like to point out is the outcome of my own experience of coding.
Sometimes you need to pass objects to functions. In that case, if your object is of a very big class then passing it as an object will copy its state (which you might not want ..AND CAN BE BIG OVERHEAD) thus resulting in an overhead of copying object .while pointer is fixed 4-byte size (assuming 32 bit). Other reasons are already mentioned above...
There are many excellent answers already, but let me give you one example:
I have an simple Item class:
class Item
{
public:
std::string name;
int weight;
int price;
};
I make a vector to hold a bunch of them.
std::vector<Item> inventory;
I create one million Item objects, and push them back onto the vector. I sort the vector by name, and then do a simple iterative binary search for a particular item name. I test the program, and it takes over 8 minutes to finish executing. Then I change my inventory vector like so:
std::vector<Item *> inventory;
...and create my million Item objects via new. The ONLY changes I make to my code are to use the pointers to Items, excepting a loop I add for memory cleanup at the end. That program runs in under 40 seconds, or better than a 10x speed increase.
EDIT: The code is at http://pastebin.com/DK24SPeW
With compiler optimizations it shows only a 3.4x increase on the machine I just tested it on, which is still considerable.