I've always seen pointers as a specialization of iterators, used in the particular case where objects are stored contiguously in memory. However, I found out that the Allocator member type pointer (which then defines the same member type on iterators and containers) is not required to be a real pointer but it may be what's called a fancy pointer that, according to cppreference,
is used to access objects allocated in address spaces that differ from the homogeneous virtual address space.
This is what I used to call an iterator but not a pointer, but that's not the point. I'm wondering how an Allocator is not required to allocate contiguous memory. Imagine passing a custom allocator to std::vector that does not allocate objects in a contiguous manner. For me that's not a vector anymore. If I don't want objects to be contiguous in memory I use a list rather than a vector with a custom allocator. They just look like a big source of confusion, why were they introduced?
So, C++ as a language has a concept of a pointer. For any type T*, the language requires certain specific things out of it. In particular, sizeof(T*) is statically-determined, and all language pointers have the same size.
In the long-before time however, the things that T* could work with did not reflect hardware capabilities. Imagine if hardware had two pools of memory, but these pools of memory have different maximum sizes. So if the compiler mapped T* to the smaller pool, then one could not have a T* that points to the larger pool of memory. After all, the address space of the larger memory would exceed the size of T*.
This led to the use of things like LONG_POINTER and the like for "pointing" into memory outside of the T* accessible space.
The goal of the allocator::pointer concept is to be able to write an allocator that can allow any allocator-aware container to work with such non-T*-compatible pools of memory transparently.
Of course... that is no longer relevant for most hardware. Virtual memory, 32 and 64-bit addressing, and the like are now standard. As such, if there's memory that's not directly accessible and you want to address it, the expectation is that you'll convert such memory into virtual addresses via OS calls which can then be used with language pointers just fine. See mapping of files and GPU memory.
It was intended to solve a problem at the library level, but the problem was instead solved at the hardware/OS level. Thus, it's an unnecessary complication of allocators and allocator-aware containers, but it remains present because of decisions made nearly 25 years ago.
I've always seen pointers as a specialization of iterators
Iteration of arrays is one of the uses of pointers. It is not their only purpose. They are also used as a "handle" to identify dynamic allocations.
why were [fancy pointers] introduced?
The page that you linked explains a reason for their introduction:
Such pointers were introduced to support segmented memory architectures ...
These days, such architectures are quite rare, but programmers have found other use cases:
... and are used today to access objects allocated in address spaces that differ from the homogeneous virtual address space that is accessed by raw pointers. An example of a fancy pointer is the mapping address-independent pointer boost::interprocess::offset_ptr, which makes it possible to allocate node-based data structures such as std::set in shared memory and memory mapped files mapped in different addresses in every process.
The standard paper P0773R0 linked in the linked page has a more detailed list of purposes:
Scenario A. "Offset" pointers with a modified bit-level representation.
Scenario B. Small-data-area "near" pointers.
Scenario C. High-memory-area "far" pointers.
Scenario D. "Fat" pointers carrying metadata for dereference-time.
Scenario E. "Segmented" pointers carrying metadata for deallocate-time.
Note that not all standard library implementations support all use cases of fancy pointers for all standard containers as explored in P0773R0.
Related
I'm starting with the assumption that, generally, it is a good idea to allocate small objects in the stack, and big objects in dynamic memory. Another assumption is that I'm possibly confused while trying to learn about memory, STL containers and smart pointers.
Consider the following example, where I have an object that is necessarily allocated in the free store through a smart pointer, and I can rely on clients getting said object from a factory, for instance. This object contains some data that is specifically allocated using an STL container, which happens to be a std::vector. In one case, this data vector itself is dynamically allocated using some smart pointer, and in the other situation I just don't use a smart pointer.
Is there any practical difference between design A and design B, described below?
Situation A:
class SomeClass{
public:
SomeClass(){ /* initialize some potentially big STL container */ }
private:
std::vector<double> dataVector_;
};
Situation B:
class SomeOtherClass{
public:
SomeOtherClass() { /* initialize some potentially big STL container,
but is it allocated in any different way? */ }
private:
std::unique_ptr<std::vector<double>> pDataVector_;
};
Some factory functions.
std::unique_ptr<SomeClass> someClassFactory(){
return std::make_unique<SomeClass>();
}
std::unique_ptr<SomeOtherClass> someOtherClassFactory(){
return std::make_unique<SomeOtherClass>();
}
Use case:
int main(){
//in my case I can reliably assume that objects themselves
//are going to always be allocated in dynamic memory
auto pSomeClassObject(someClassFactory());
auto pSomeOtherClassObject(someOtherClassFactory());
return 0;
}
I would expect that both design choices have the same outcome, but do they?
Is there any advantage or disadvantage for choosing A or B? Specifically, should I generally choose design A because it's simpler or are there more considerations? Is B morally wrong because it can dangle for a std::vector?
tl;dr : Is it wrong to have a smart pointer pointing to a STL container?
edit:
The related answers pointed to useful additional information for someone as confused as myself.
Usage of objects or pointers to objects as class members and memory allocation
and Class members that are objects - Pointers or not? C++
And changing some google keywords lead me to When vectors are allocated, do they use memory on the heap or the stack?
std::unique_ptr<std::vector<double>> is slower, takes more memory, and the only advantage is that it contains an additional possible state: "vector doesn't exist". However, if you care about that state, use boost::optional<std::vector> instead. You should almost never have a heap-allocated container, and definitely never use a unique_ptr. It actually works fine, no "dangling", it's just pointlessly slow.
Using std::unique_ptr here is just wasteful unless your goal is a compiler firewall (basically hiding the compile-time dependency to vector, but then you'd need a forward declaration to standard containers).
You're adding an indirection but, more importantly, the full contents of SomeClass turns into 3 separate memory blocks to load when accessing the contents (SomeClass merged with/containing unique_ptr's block pointing to std::vector's block pointing to its element array). In addition you're paying one extra superfluous level of heap overhead.
Now you might start imagining scenarios where an indirection is helpful to the vector, like maybe you can shallow move/swap the unique_ptrs between two SomeClass instances. Yes, but vector already provides that without a unique_ptr wrapper on top. And it already has states like empty that you can reuse for some concept of validity/nilness.
Remember that variable-sized containers themselves are small objects, not big ones, pointing to potentially big blocks. vector isn't big, its dynamic contents can be. The idea of adding indirections for big objects isn't a bad rule of thumb, but vector is not a big object. With move semantics in place, it's worth thinking of it more like a little memory block pointing to a big one that can be shallow copied and swapped cheaply. Before move semantics, there were more reasons to think of something like std::vector as one indivisibly large object (though its contents were always swappable), but now it's worth thinking of it more like a little handle pointing to big, dynamic contents.
Some common reasons to introduce an indirection through something like unique_ptr is:
Abstraction & hiding. If you're trying to abstract or hide the concrete definition of some type/subtype, Foo, then this is where you need the indirection so that its handle can be captured (or potentially even used with abstraction) by those who don't know exactly what Foo is.
To allow a big, contiguous 1-block-type object to be passed around from owner to owner without invoking a copy or invalidating references/pointers (iterators included) to it or its contents.
A hasty kind of reason that's wasteful but sometimes useful in a deadline rush is to simply introduce a validity/null state to something that doesn't inherently have it.
Occasionally it's useful as an optimization to hoist out certain less frequently-accessed, larger members of an object so that its commonly-accessed elements fit more snugly (and perhaps with adjacent objects) in a cache line. There unique_ptr can let you split apart that object's memory layout while still conforming to RAII.
Now wrapping a shared_ptr on top of a standard container might have more legitimate applications if you have a container that can actually be owned (sensibly) by more than one owner. With unique_ptr, only one owner can possess the object at a time, and standard containers already let you swap and move each other's internal guts (the big, dynamic parts). So there's very little reason I can think of to wrap a standard container directly with a unique_ptr, as it's already somewhat like a smart pointer to a dynamic array (but with more functionality to work with that dynamic data, including deep copying it if desired).
And if we talk about non-standard containers, like say you're working with a third party library that provides some data structures whose contents can get very large but they fail to provide those cheap, non-invalidating move/swap semantics, then you might superficially wrap it around a unique_ptr, exchanging some creation/access/destruction overhead to get those cheap move/swap semantics back as a workaround. For the standard containers, no such workaround is needed.
I agree with #MooingDuck; I don't think using std::unique_ptr has any compelling advantages. However, I could see a use case for std::shared_ptr if the member data is very large and the class is going to support COW (copy-on-write) semantics (or any other use case where the data is shared across multiple instances).
I have lately been trying to create custom containers that are similar to some of the library containers (i.e vector, list). and while I was using an allocator to allocate dynamic memory I noticed that the idea behind allocators and built in arrays are the same. allocators reserve a certain amount of raw, unconstructed dynamic memory and return a pointer to the first free location in that pool of memory. and built in arrays pretty much do the same thing. so if we have an std::allocator for strings called alloc
this codealloc.allocate(7) and this code string* array = new string[7] should have the same effect. and if we want to construct the raw memory we can call std::allocator::construct passing it the pointer returned from the allocate function, or we can have something like array[0] = string("something") to do the same thing. correct?
so what is there a difference between how an allocator work and how a built in array work?
You're right that they're fundamentally related, but not in that way. new string[7] could indeed be decomposed into allocate and construct (with a few extra bits for EH and other details).
Separating them out in the allocator interface allows much more fine-grained control for containers so that they can, for example, have memory with non-constructed objects in them, which is often vital for correct performance guarantees or semantics.
Additionally, The allocator interface is, of course, an interface with many possible implementations, such as memory arenas or object pools, which new string[7] really doesn't offer.
Finally, new T[] is shit and don't ever use it. The allocator interface is designed to be used only by fairly experienced programmers in quite limited ways- as a component of a better library component. new T[] is a language feature that everybody can just use, with terrible results.
An array is collinear container of slots for items in memory. The array is a range.
An allocator is an function object (or function) that reserves memory. The allocator can designate space from an array, stack, heap, or other areas of memory. The allocator can also be used to allocate space outside of the memory area, such as a hard drive or other device (maybe a server, cloud, etc.)
The space allocated for an array is usually determined by the compiler during the build phase.
An allocator is used for dynamic (during run-time) allocation of objects.
I am working in a project which have two different processes.
The first process is a cache base on a std::map or std::set which allocate all the data in a share memory region.
The second process is a producer/consumer which will have access to the share memory, so whenever it needs some data it will ask through an unix pipe to the cache process the starting address of the shared memory which contain the requested data.
So far, I came up with two approaches, first is changing the allocation function for std::set to always allocate in the shared memory, or maybe in a easier approach storing as the value of the map a pointer to that shared region:
map<key, pointer to share region>
Any idea? :D
Thanks!!
In theory, you can use a custom allocator for std::set or std::map to do this. Of course, you'll have to ensure that any contents that might dynamically allocate also use the same custom allocator.
The real problem is that the mapped addresses of the shared memory might not be the same. It's often possible to work around this by using mmap and specifying the address, but the address range must be free in both processes. I've done this under Solaris, which always allocates (or allocated) static and heap at the bottom of the address space, and stack at the top, leaving a big hole in the middle, but even there, I don't think there was any guarantee, and other systems have different policies. Still, if the processes aren't too big otherwise, you may be able to find a solution empirically. (I'd recommend making the address and the size a configuration parameter.)
Alternatively, in theory, the allocator defines a pointer type, which the container should use; you should be able to define a pointer type which works with just the offset into the shared memory. I've no experience with this, however, and I fear that it could be very tricky, since the reference type will still be a true reference (and thus a pointer under the hood), and you cannot change this.
I've recently been fiddling with developing a custom allocator based on a memory pool, that is shared between multiple instances of the allocator.
The intention was that the allocator be compatible with STL and Standard C++ based containers such as vector, deque, map, string etc
However something in particular has caused me some confusion. Various implementations of the containers such as std::vector, std::string make use of Small Buffer Optimisation - stack based allocation for small initial memory requirements.
For example MSVC9.1 has the following member in the basic_string class:
union _Bxty
{ // storage for small buffer or pointer to larger one
_Elem _Buf[_BUF_SIZE];
_Elem *_Ptr;
char _Alias[_BUF_SIZE]; // to permit aliasing
} _Bx;
I can't see how when instantiating such containers one can cajole
the implementation to only and always use the provided allocator
and not use SBO. I ask because one of intentions of implementing
custom allocators was to be able to use them in a shared memory
context, where the amount of the shared memory may be less than
the SBO limit some of the various implementations may use.
For example I would like to have a situation where I can have two
instances of std::string one per process sharing a common block
of memory which maybe smaller than or equal to the SBO upper
limit.
Possibly related: May std::vector make use of small buffer optimization?
typedef std::vector<int,mysharedmemallocator> shmvtype;
shmvtype v(2,0); //<-- if SBO then error as memory is allocated on
//stack not via the allocator
v[1] = 1234; //<-- if SBO then error as wrong piece of memory
// is being modified.
Lets look at another example that is not based on shared memory as it seems to over complicate things for some people. Lets say I want to specialize my std::basic_string or std::vector etc with an allocator that fills the memory it allocates with the value 0xAB prior to presenting the pointer back to the calling entity for no reason other than whimsy.
A container that is specialised with this new allocator, but that also uses SBO, will not have its SBO based memory filled with 0xAB pattern. So for example:
typedef std::basic_string<char,myfillmemallocator> stype
stype s;
s.resize(2);
assert(s[0] == 0xAB); // if SBO this will fail.
one of intentions of implementing custom allocators was to be able to use them in a shared memory context
This may be what you intend to do with it, but that's not why they exist. Indeed, with the exception of basic_string in C++98/03, it is not legal to share allocated memory between objects at all. They can share allocator objects, so they can get their memory from the same place. But it is illegal for modifications of one object to impact another that is unrelated; each instance must be separate.
Copy-on-write strings only work because the system assumes that any non-const access to a character will write to it, thus performing a copy. And in C++11, even basic_string is forbidden from doing copy-on-write-style stuff like this.
For example I would like to have a situation where I can have two instances of std::string one per process sharing a common block of memory which maybe smaller than or equal to the SBO upper limit.
That's not possible without writing your own class. The allocator only controls where the memory comes from. What you're wanting is a guaranteed copy-on-write string or some sort of shared string class.
What you want requires a container class specifically designed for this purpose.
I have a notion that C++ runtime doesn't do any heap compaction which means that the address of an object created on heap never changes. I want to confirm if this is true and also if it is true for every platform (Win32, Mac, ...)?
The C++ standard says nothing about a heap, nor about compaction. However, it does require that if you take the address of an object, that address stays the same throughout the object's lifetime.
A C++ implementation could do some kind of heap compaction and move objects around behind the scenes. But then the "addresses" it return to you when you use the address-of operator, are not actually memory addresses but some other kind of mapping.
In other words, yes, it is safe to assume that addresses in C++ stay the same while the object you're taking the address of lives.
What happens behind the scenes is unknown. It is possible that the physical memory addresses change (although common C++ compilers wouldn't do this, it might be relevant for compilers targeting various forms of bytecode, such as Flash), but the addresses that your program sees are going to behave nicely.
The standard does not specify it, but then the standard does not specify a heap. This is entirely dependent on your implementation. However, there is nothing stopping an implementation compacting unused memory while maintaining the same addreses for objects in use.
You are right it does not change. Pages can be moved around in physical memory but the Translation Lookaside Buffer (This is what control virtual memory) hides all that from you.
I'm unaware of any C++ implementation that will move allocated objects around. I suppose it might be technically permitted by the standard (though I'm not 100% sure about that), but remember that the standard must allow a pointer to be cast to a large enough integral type and back again and still be a valid pointer. So an implementation that could move dynamically allocated objects around would have to be able to deal with the admittedly unlikely series of events where:
a pointer is cast to an intptr_t
that value is transformed somehow (xor'ed with some value), so the runtime can't detect that it's a pointer to a particular object
the object gets moved due to compaction
the intptr_t gets transformed back into its original value, and
cast back to a pointer to the object type
The implementation would need to ensure that the pointer from that last step points to the moved object's new location.
I suppose using double indirection for pointers might allow an implementation to deal with this, but I'm unaware of any implementation that does anything like this.
Under normal circumstances when you're using the system compiler's default runtimes, you can safely assume that pointers will not be invalidated by the runtime.
If you are not using the default memory managers, but a 3rd-party memory manager instead, it completely depends on the runtime and memory manager you are using. While C++ objects do not generally get moved around in memory by the memory manager, you can write a memory manager that compacts free space and you could conceivably write one that would move allocated objects around to maximise free space as well.