C++ Deleting objects from memory - c++

Lets say I have allocated some memory and have filled it with a set of objects of the same type, we'll call these components.
Say one of these components needs to be removed, what is a good way of doing this such that the "hole" created by the component can be tested for and skipped by a loop iterating over the set of objects?
The inverse should also be true, I would like to be able to test for a hole in order to store new components in the space.
I'm thinking menclear & checking for 0...

boost::optional<component> seems to fit your needs exactly. Put those in your storage, whatever that happens to be. For example, with std::vector
// initialize the vector with 100 non-components
std::vector<boost::optional<component>> components(100);
// adding a component at position 15
components[15].reset(component(x,y,z));
// deleting a component at position 82
componetnts[82].reset()
// looping through and checking for existence
for (auto& opt : components)
{
if (opt) // component exists
{
operate_on_component(*opt);
}
else // component does not exist
{
// whatever
}
}
// move components to the front, non-components to the back
std::parition(components.begin(), components.end(),
[](boost::optional<component> const& opt) -> bool { return opt; });

The short answer is it depends on how you store it in memmory.
For example, the ansi standard suggests that vectors be allocated contiguously.
If you can predict the size of the object, you may be able to use a function such as size_of and addressing to be able to predict the location in memory.
Good luck.

There are at least two solutions:
1) mark hole with some flag and then skip it when processing. Benefit: 'deletion' is very fast (only set a flag). If object is not that small even adding a "bool alive" flag can be not so hard to do.
2) move a hole at the end of the pool and replace it with some 'alive' object.
this problem is related to storing and processing particle systems, you could find some suggestions there.

If it is not possible to move the "live" components up, or reorder them such that there is no hole in the middle of the sequence, then the best option if to give the component objects a "deleted" flag/state that can be tested through a member function.
Such a "deleted" state does not cause the object to be removed from memory (that is just not possible in the middle of a larger block), but it does make it possible to mark the spot as not being in use for a component.

When you say you have "allocated some memory" you are likely talking about an array. Arrays are great because they have virtually no overhead and extremely fast access by index. But the bad thing about arrays is that they aren't very friendly for resizing. When you remove an element in the middle, all following elements have to be shifted back by one position.
But fortunately there are other data structures you can use, like a linked list or a binary tree, which allow quick removal of elements. C++ even implements these in the container classes std::list and std::set.
A list is great when you don't know beforehand how many elements you need, because it can shrink and grow dynamically without wasting any memory when you remove or add any elements. Also, adding and removing elements is very fast, no matter if you insert them at the beginning, in the end, or even somewhere in the middle.
A set is great for quick lookup. When you have an object and you want to know if it's already in the set, checking it is very quick. A set also automatically discards duplicates which is really useful in many situations (when you need duplicates, there is the std::multiset). Just like a list it adapts dynamically, but adding new objects isn't as fast as in a list (not as expensive as in an array, though).

Two suggestions:
1) You can use a Linked List to store your components, and then not worry about holes.
Or if you need these holes:
2) You can wrap your component into an object with a pointer to the component like so:
class ComponentWrap : public
{
Component component;
}
and use ComponentWrap.component == null to find if the component is deleted.
Exception way:
3) Put your code in a try catch block in case you hit a null pointer error.

Related

Moving values between lockfree lists

Background
I am trying to design and implement lock-free hashmap using chaining method in C++. Each hash table cell is supposed to contain lockfree list. To enable resizing, my data structure is supposed to contain two arrays - small one which is always available and a bigger one for resizing, when the smaller one is no longer sufficient. When the bigger one is created I would like the data stored in small one to be transfered to bigger one by one, whenever any thread does something with the data structure (adds element, searches or removes one). When all data is transfered, the bigger array is moved in place of smaller and the latter one is deleted. The cycle repeats whenever the array needs to be enlarged.
Problem
As mentioned before, each array is supposed to conatin lists in cells. I am trying to find a way to transfer a value or node from one lockfree list to another in such a manner that would keep value visible in any (or both) of the lists. It is needed to ensure that search in hash map won't give the user false negatives. So my questions are:
Is such lockfree list implementation possible?
If so, what would be the general concept of such list and "moving node/value" operation? I would be thankful for any pseudocode, C++ code or scientific article describing it.
To be able to resize the array, while maintaining the lock-free progress guarantees, you will need to use operation descriptors. Once the resize starts, add a descriptor that contains references to the old and the new arrays.
On any operation (add, search, or remove):
Add operation, search the old array, if the element already exists, then move the element to the new array before returning. Indicate, with a descriptor or a special null value that the element has already been moved so that other threads don't attempt the move again
Search, search the old array and move the element as indicated above.
Remove - Remove too will have to search the old array first.
Now the problem is that you will have a thread that has to verify that the move is complete, so that you can remove the descriptor and free up the old array. To maintain lock-freedom, you will need to have all active threads attempt to do this validation, thus it becomes very expensive.
You can look at:
https://dl.acm.org/citation.cfm?id=2611495
https://dl.acm.org/citation.cfm?id=3210408

Advice on how to implement std::vector using a circular array?

I am to write a C++ program that :
"Implements the vector ADT by means of an extendable array used in a circular fashion, so that insertions and deletions at the beginning and end run in constant time. (So not O(n)). Print the circular array before and after each insertion and deletion, You cannot use the STL."
This task seems very confusing to me. A std::vector is implemented using a dynamic array that is based off the concept of a stack, correct? Performing a deletion or insertion at the front seems to me that this should be implemented as a Queue or maybe a Dequeue, not a Vector. Also, a circular array would mean that when data is pushed onto an array that is Full, old data becomes overwritten, right? So when should I know to expand the vector's capacity?
If I'm not making sense here, Basically I need help in understanding how I should go about implementing a dynamic circular array..
Yes, this is a homework assignment. No, I do not expect anyone to provide code for me, I only wish for someone to give me a push in the right direction as to how I should think about implementing this. Thank you.
I think you are actually being asked to implement deque. The point of the "circularity" is that in normal vector you cannot add an element at the beginning since there is no free space and you would have to move all other elements to the right. So what you can do is you simulate a circle by putting the element to the end the base array and remember that's where the first element is.
Example: 2, 3, -, -, 1 where 1 is first and 3 is last
So, basically you insert elements circullary, and remember where the first and the last elements are so you can add to beginning/end in O(1). Also when the array is full, you have to move all the elements to a larger one. If you double the size, you still get amortized time of O(1)
1) m_nextIn and m_nextOut - data attributes of class queue;
I find it useful to have two integers, with label m_nextIn and m_nextOut ... these identify where in the circular array you 'insert' the next (i.e. youngest) obj instance into the queue, and where you 'delete' the oldest obj instance from the queue.
These two items also provide constant time insert and delete.
Don't get confused as to where the beginning or end of the queue is. The array starts at index 0, but this is not the beginning of your queue.
The beginning of your queue is at nextIn (which probably is not 0, but may be). Technique also known as round-robin (a research term).
2) empty and full - method attributes
Determining queue full / empty can be easily computed from m_nextIn and m_nextOut.
3) extendable
Since you are prohibited from using vector (which itself is extendable) you must implement this functionality yourself.
Note about your comment: The "dynamic memory" concept is not related to stack. (another research term)
Extendable issues occur when your user code invokes the 'insert' AND the array is already full. (capture this test effort) You will need to detect this issue, then do 4 things:
3.1) allocate a new array (use new, and simply pick an appropriate size.)
Hint - std::vector() doubles it's capacity each time a push_back() would overflow the current capacity
3.2) transfer the entire contents of the array to the new array, fixing all the index's as you go. Since the new array is bigger, just insert trivially.
3.3) delete the old array - i.e. you copied from the old array to the new array, so do you 'delete' them? or simply delete the array?
3.4) finish the 'insert' - you were in the middle of inserting another instance, right?
Good luck.

Which data structure is sorted by insertion and has fast "contains" check?

I am looking for a data structure that preserves the order in which the elements were inserted and offers a fast "contains" predicate. I also need iterator and random access. Performance during insertion or deletion is not relevant. I am also willing to accept overhead in terms of memory consumption.
Background: I need to store a list of objects. The objects are instances of a class called Neuron and stored in a Layer. The Layer object has the following public interface:
class Layer {
public:
Neuron *neuronAt(const size_t &index) const;
NeuronIterator begin();
NeuronIterator end();
bool contains(const Neuron *const &neuron) const;
void addNeuron(Neuron *const &neuron);
};
The contains() method is called quite often when the software runs, I've asserted that using callgrind. I tried to circumvent some of the calls to contains(), but is still a hot spot. Now I hope to optimize exactly this method.
I thought of using std::set, using the template argument to provide my own comparator struct. But the Neuron class itself does not give its position in the Layer away. Additionally, I'd like to have *someNeuronIterator = anotherNeuron to work without screwing up the order.
Another idea was to use a plain old C array. Since I do not care about the performance of adding a new Neuron object, I thought I could make sure that the Neuron objects are always stored linear in memory. But that would invalidate the pointer I pass to addNeuron(); at least I'd have to change it to point to the new copy I created to keep things linear aligned. Right?
Another idea was to use two data structures in the Layer object: A vector/list for the order, and a map/hash for lookup. But that would contradict my wish for an iterator that allowed operator* without a const reference, wouldn't it?
I hope somebody can hint an idea for a data structure or a concept that would satisfy my needs, or at least give me an idea for an alternative. Thanks!
If this contains check is really where you need the fastest execution, and assuming you can be a little intrusive with the source code, the fastest way to check if a Neuron belongs in a layer is to simply flag it when you insert it into a layer (ex: bit flag).
You have guaranteed O(1) checks at that point to see if a Neuron belongs in a layer and it's also fast at the micro-level.
If there can be numerous layer objects, this can get a little trickier, as you'll need a separate bit for each potential layer a neuron can belong to unless a Neuron can only belong in a single layer at once. This is reasonably manageable, however, if the number of layers are relatively fixed in size.
If the latter case and a Neuron can only belong to one layer at once, then all you need is a backpointer to Layer*. To see if a Neuron belongs in a layer, simply see if that backpointer points to the layer object.
If a Neuron can belong to multiple layers at once, but not too many at one time, then you could store like a little array of backpointers like so:
struct Neuron
{
...
Layer* layers[4]; // use whatever small size that usually fits the common case
Layer* ptr;
int num_layers;
};
Initialize ptr to point to layers if there are 4 or fewer layers to which the Neuron belongs. If there are more, allocate it on the free store. In the destructor, free the memory if ptr != layers. You can also optimize away num_layers if the common case is like 1 layer, in which case a null-terminated solution might work better. To see if a Neuron belongs to a layer, simply do a linear search through ptr. That's practically constant-time complexity with respect to the number of Neurons provided that they don't belong in a mass number of layers at once.
You can also use a vector here but you might reduce cache hits on those common case scenarios since it'll always put its contents in a separate block, even if the Neuron only belongs to like 1 or 2 layers.
This might be a bit different from what you were looking for with a general-purpose, non-intrusive data structure, but if your performance needs are really skewed towards these kinds of set operations, an intrusive solution is going to be the fastest in general. It's not quite as pretty and couples your element to the container, but hey, if you need max performance...
Another idea was to use a plain old C array. Since I do not care about the performance of adding a new Neuron object, I thought I could make sure that the Neuron objects are always stored linear in memory. But that would invalidate the pointer I pass to addNeuron(); [...]
Yes, but it won't invalidate indices. While not as convenient to use as pointers, if you're working with mass data like vertices of a mesh or particles of an emitter, it's common to use indices here to avoid the invalidation and possibly to save an extra 32-bits per entry on 64-bit systems.
Update
Given that Neurons only exist in one Layer at a time, I'd go with the back pointer approach. Seeing if a neuron belongs to a layer becomes a simple matter of checking if the back pointer points to the same layer.
Since there's an API involved, I'd suggest, just because it sounds like you're pushing around a lot of data and have already profiled it, that you focus on an interface which revolves around aggregates (layers, e.g.) rather than individual elements (neurons). It'll just leave you a lot of room to swap out underlying representations when your clients aren't performing operations at the individual scalar element-type interface.
With the O(1) contains implementation and the unordered requirement, I'd go with a simple contiguous structure like std::vector. However, you do expose yourself to potential invalidation on insertion.
Because of that, if you can, I'd suggest working with indices here. However, that become a little unwieldy since it requires your clients to store both a pointer to the layer in which a neuron belongs in addition to its index (though if you do this, the backpointer becomes unnecessary as the client is tracking where things belong).
One way to mitigate this is to simply use something like std::vector<Neuron*> or ptr_vector if available. However, that can expose you to cache misses and heap overhead, and if you want to optimize that, this is where the fixed allocator comes in handy. However, that's a bit of a pain with alignment issues and a bit of a research topic, and so far it seems like your main goal is not to optimize insertion or sequential access quite as much as this contains check, so I'd start with the std::vector<Neuron*>.
You can get O(1) contains-check, O(1) insert and preserve insertion order. If you are using Java, looked at LinkedHashMap. If you are not using Java, look at LinkedHashMap and figure out a parallel data structure that does it or implement it yourself.
It's just a hashmap with a doubly linked list. The link list is to preserve order and the hashmap is to allow O(1) access. So when you insert an element, it makes an entry with the key, and the map will point to a node in the linked list where your data will reside. To look up, you go to the hash table to find the pointer directly to your linked list node (not the head), and get the value in O(1). To access them sequentially, you just traverse the linked list.
A heap sounds like it could be useful to you. It's like a tree, but the newest element is always inserted at the top, and then works its way down based on its value, so there is a method to quickly check if it's there.
Otherwise, you could store a hash table (quick method to check if the neuron is contained in the table) with the key for the neuron, and values of: the neuron itself, and the timestep upon which the neuron was inserted (to check its chronological insertion time).

Create dynamic array of objects

I want to create a dynamic array of a specific object that would also support adding new objects to the array.
I'm trying to solve this as part of an exercise in my course. In this exercise we are not supposed to use std::vector.
For example, let's say I have a class named Product and declare a pointer:
Products* products;
then I want to support the following:
products = new Product();
/* code here... */
products[1] = new Product(); // and so on...
I know the current syntax could lead to access violation. I don't know the size of the array in advance, as it can change throughout the program.
The questions are:
How can I write it without vectors?
Do I have to use double pointers (2-dimension)?
Every time I want to add a new object, do I have to copy the array to the new array (with +1 size), and then delete the array?
You should not write this without std::vector. If you for some reason need to, your copying with every resize is by far the easiest option.
I do not see how that would help. (I.e. no)
As mentioned above, this is by far the easiest method if you cannot use std::vector. Everything else would be (partially) reinventing one standard library container or the other, which is hard.
You have to use your own memory memory management, i.e. more specifically wrt your other (related) questions:
No, if you have a contiguous allocated chunk of memory where your data lives in.
Yes, if 2. is your desired implementation method. However, if you don't want to use a large memory chunk, you have to use a (double) linked list which does not require you to copy the whole array every time.
I see people already answered your specific questions, so I'll answer a more general answer.
You must implement a way to do it by yourself, but there are lots of Abstract Data Types that you can use, as far as I can see the simplest would be a linked list, such as the following:
class ProductNode
{
public:
ProductNode() : _data(NULL), _next(NULL)
{
}
void setProduct(Product* p); //setter for the product pointer
{
this->_data = p;
}
Product getProduct(); //getter for the product pointer
{
return *(this->_data);
}
void addNext(); //allocate memory for another ProductNode in '_next'
{
if(!next)
{
this->_next = new ProductNode();
}
}
ProductNode* getNext(); //get the allocated memory, the address that is in '_next'
{
return this->_next;
}
~ProductNode(); //delete every single node from that node and forward, it'll be recursive for a linked list
private:
Product* _data;
ProductNode* _next;
}
Declare a head variable and go from there.
Of course that most of the functions here should be implemented otherwise, it was coded quickly so you could see the basics that you need for this assignment.
That's one way.
Also you can make your own data type.
Or use some others data types for abstraction of the data.
What you probably should do (i.e. what I believe you're expected to do) is write your own class that represents a dynamic array (i.e. you're going to reinvent parts of std::vector.)
Despite what many around here say, this is a worthwhile exercise and should be part of a normal computer science curriculum.
Use a dynamically allocated array which is a member of your class.
If you're using a dynamically allocated array of Product*, you'll be storing a Product**, so yes, in a way. It's not necessary to have "double pointers" for the functionality itself, though.
Technically no - you can allocate more than necessary and only reallocate and copy when you run out of space. This is what vector does to improve its time complexity.
Expanding for each element is a good way to start though, and you can always change your strategy later.
(Get the simple way working first, get fancy if you need to.)
It's probably easiest to first implement an array of int (or some other basic numerical type) and make sure that it works, and then change the type of the contents.
I suppose by "Products *products;" you mean "Products" is a vector-like container.
1) How can I write it without vectors?
As a linked list. Instantiating a "Products products" will give you an empty linked list.
Overriding the operator[] will insert/replace the element in the list. So you need to scan the list to find the right place. If several elements are missing until you got the right place, you may need to append those "neutral" elements before your element. Doing so, through "Product *products" is not feasible if you plan to override the operator[] to handle addition of elements, unless you declare "Products products" instead
2) Do I have to use double pointers (2-dimension)?
This question lacks of precision. As in "typedef Product *Products;" then "Products *products" ? as long as you maintained a " * " between "Products" and "products", there is no way to override operator[] to handle addition of element.
3) Every time I want to add a new object, do I have to copy the array to the new array (with +1 size), and then delete the array?
If you stick with array, you can use a O(log2(n)) time reallocation, by simply growing twice the array size (and supposedly you have a zero-terminal or a count embedded). Or just use a linked list instead to avoid any copy of all elements before adding an element.

Concatenate 2 STL vectors in constant O(1) time

I'll give some context as to why I'm trying to do this, but ultimately the context can be ignored as it is largely a classic Computer Science and C++ problem (which must surely have been asked before, but a couple of cursory searches didn't turn up anything...)
I'm working with (large) real time streaming point clouds, and have a case where I need to take 2/3/4 point clouds from multiple sensors and stick them together to create one big point cloud. I am in a situation where I do actually need all the data in one structure, whereas normally when people are just visualising point clouds they can get away with feeding them into the viewer separately.
I'm using Point Cloud Library 1.6, and on closer inspection its PointCloud class (under <pcl/point_cloud.h> if you're interested) stores all data points in an STL vector.
Now we're back in vanilla CS land...
PointCloud has a += operator for adding the contents of one point cloud to another. So far so good. But this method is pretty inefficient - if I understand it correctly, it 1) resizes the target vector, then 2) runs through all Points in the other vector, and copies them over.
This looks to me like a case of O(n) time complexity, which normally might not be too bad, but is bad news when dealing with at least 300K points per cloud in real time.
The vectors don't need to be sorted or analysed, they just need to be 'stuck together' at the memory level, so the program knows that once it hits the end of the first vector it just has to jump to the start location of the second one. In other words, I'm looking for an O(1) vector merging method. Is there any way to do this in the STL? Or is it more the domain of something like std::list#splice?
Note: This class is a pretty fundamental part of PCL, so 'non-invasive surgery' is preferable. If changes need to be made to the class itself (e.g. changing from vector to list, or reserving memory), they have to be considered in terms of the knock on effects on the rest of PCL, which could be far reaching.
Update: I have filed an issue over at PCL's GitHub repo to get a discussion going with the library authors about the suggestions below. Once there's some kind of resolution on which approach to go with, I'll accept the relevant suggestion(s) as answers.
A vector is not a list, it represents a sequence, but with the additional requirement that elements must be stored in contiguous memory. You cannot just bundle two vectors (whose buffers won't be contiguous) into a single vector without moving objects around.
This problem has been solved many times before such as with String Rope classes.
The basic approach is to make a new container type that stores pointers to point clouds. This is like a std::deque except that yours will have chunks of variable size. Unless your clouds chunk into standard sizes?
With this new container your iterators start in the first chunk, proceed to the end then move into the next chunk. Doing random access in such a container with variable sized chunks requires a binary search. In fact, such a data structure could be written as a distorted form of B+ tree.
There is no vector equivalent of splice - there can't be, specifically because of the memory layout requirements, which are probably the reason it was selected in the first place.
There's also no constant-time way to concatenate vectors.
I can think of one (fragile) way to concatenate raw arrays in constant time, but it depends on them being aligned on page boundaries at both the beginning and the end, and then re-mapping them to be adjacent. This is going to be pretty hard to generalise.
There's another way to make something that looks like a concatenated vector, and that's with a wrapper container which works like a deque, and provides a unified iterator and operator[] over them. I don't know if the point cloud library is flexible enough to work with this, though. (Jamin's suggestion is essentially to use something like this instead of the vector, and Zan's is roughly what I had in mind).
No, you can't concatenate two vectors by a simple link, you actually have to copy them.
However! If you implement move-semantics in your element type, you'd probably get significant speed gains, depending on what your element contains. This won't help if your elements don't contain any non-trivial types.
Further, if you have your vector reserve way in advance the memory needed, then that'd also help speed things up by not requiring a resize (which would cause an undesired huge new allocation, possibly having to defragment at that memory size, and then a huge memcpy).
Barring that, you might want to create some kind of mix between linked-lists and vectors, with each 'element' of the list being a vector with 10k elements, so you only need to jump list links once every 10k elements, but it allows you to dynamically grow much easier, and make your concatenation breeze.
std::list<std::vector<element>> forIllustrationOnly; //Just roll your own custom type.
index = 52403;
listIndex = index % 1000
vectorIndex = index / 1000
forIllustrationOnly[listIndex][vectorIndex] = still fairly fast lookups
forIllustrationOnly[listIndex].push_back(vector-of-points) = much faster appending and removing of blocks of points.
You will not get this scaling behaviour with a vector, because with a vector, you do not get around the copying. And you can not copy an arbitrary amount of data in fixed time.
I do not know PointCloud, but if you can use other list types, e.g. a linked list, this behaviour is well possible. You might find a linked list implementation which works in your environment, and which can simply stick the second list to the end of the first list, as you imagined.
Take a look at Boost range joint at http://www.boost.org/doc/libs/1_54_0/libs/range/doc/html/range/reference/utilities/join.html
This will take 2 ranges and join them. Say you have vector1 and vector 2.
You should be able to write
auto combined = join(vector1,vector2).
Then you can use combined with algorithms, etc as needed.
No O(1) copy for vector, ever, but, you should check:
Is the element type trivially copyable? (aka memcpy)
Iff, is my vector implementation leveraging this fact, or is it stupidly looping over all 300k elements executing a trivial assignment (or worse, copy-ctor-call) for each element?
What I have seen is that, while both memcpyas well as an assignment-for-loop have O(n) complexity, a solution leveraging memcpy can be much, much faster.
So, the problem might be that the vector implementation is suboptimal for trivial types.