C++ Growable, Tightly Packed Data Structure with fixed indices

C++ Growable, Tightly Packed Data Structure with fixed indices - c++

I have a group of primitive data types that needed to be kept tracked by some sort of ID.
I am looking for a data structure that would let me keep them tighly packed and the collection needs to grow as needed.
The C++ std::vector would be a perfect solution, except that when it deletes an element, some indices could be changed so I could no longer reference an element by its index. I also thought about map, but it is not tightly packed.
My solution would be rewriting std::vector such as when an entry is deleted, the entry is marked as 'dirty', and when a new entry is needed, it could try to reuse 'dirty' entries and if not possible, simply grow as std::vector.
But my problem appears to be common, and I am sure someone has ran into it. Would I need to reinvent the wheel or could someone point me to another direction?

You could use a vector<boost::optional<T>>.
To delete an item:
vec[val.id] = boost::none; // now this slot is empty
To add an item:
auto it = std::find(vec.begin(), vec.end(), boost::none);
if (it == vec.end()) {
val.id = vec.size(); // or however
vec.push_back(val);
}
else {
// this is an empty slot, overwrite it
val.id = std::distance(vec.begin(), it);
*it = val;
}
To make that more efficient, you can also keep a separate list of free indices when you do erasing - but that's just an optimization on top of the general idea.

Your idea to keep track of deleted ("dirty" in your parlance) entries is commonly done, and referred to as using a "free list."
Another idea would be to simply use the pointer (address of each datum) as the ID.

I'm late the party, but what about swapping the last element with the element to be deleted and popping that element off the back of the vector? Free lists are also a good solution, but might be over engineering it depending on what you're doing.

Related

Create dynamic array of objects

I want to create a dynamic array of a specific object that would also support adding new objects to the array.
I'm trying to solve this as part of an exercise in my course. In this exercise we are not supposed to use std::vector.
For example, let's say I have a class named Product and declare a pointer:
Products* products;
then I want to support the following:
products = new Product();
/* code here... */
products[1] = new Product(); // and so on...
I know the current syntax could lead to access violation. I don't know the size of the array in advance, as it can change throughout the program.
The questions are:
How can I write it without vectors?
Do I have to use double pointers (2-dimension)?
Every time I want to add a new object, do I have to copy the array to the new array (with +1 size), and then delete the array?

You should not write this without std::vector. If you for some reason need to, your copying with every resize is by far the easiest option.
I do not see how that would help. (I.e. no)
As mentioned above, this is by far the easiest method if you cannot use std::vector. Everything else would be (partially) reinventing one standard library container or the other, which is hard.

You have to use your own memory memory management, i.e. more specifically wrt your other (related) questions:
No, if you have a contiguous allocated chunk of memory where your data lives in.
Yes, if 2. is your desired implementation method. However, if you don't want to use a large memory chunk, you have to use a (double) linked list which does not require you to copy the whole array every time.

I see people already answered your specific questions, so I'll answer a more general answer.
You must implement a way to do it by yourself, but there are lots of Abstract Data Types that you can use, as far as I can see the simplest would be a linked list, such as the following:
class ProductNode
{
public:
ProductNode() : _data(NULL), _next(NULL)
{
}
void setProduct(Product* p); //setter for the product pointer
{
this->_data = p;
}
Product getProduct(); //getter for the product pointer
{
return *(this->_data);
}
void addNext(); //allocate memory for another ProductNode in '_next'
{
if(!next)
{
this->_next = new ProductNode();
}
}
ProductNode* getNext(); //get the allocated memory, the address that is in '_next'
{
return this->_next;
}
~ProductNode(); //delete every single node from that node and forward, it'll be recursive for a linked list
private:
Product* _data;
ProductNode* _next;
}
Declare a head variable and go from there.
Of course that most of the functions here should be implemented otherwise, it was coded quickly so you could see the basics that you need for this assignment.
That's one way.
Also you can make your own data type.
Or use some others data types for abstraction of the data.

What you probably should do (i.e. what I believe you're expected to do) is write your own class that represents a dynamic array (i.e. you're going to reinvent parts of std::vector.)
Despite what many around here say, this is a worthwhile exercise and should be part of a normal computer science curriculum.
Use a dynamically allocated array which is a member of your class.
If you're using a dynamically allocated array of Product*, you'll be storing a Product**, so yes, in a way. It's not necessary to have "double pointers" for the functionality itself, though.
Technically no - you can allocate more than necessary and only reallocate and copy when you run out of space. This is what vector does to improve its time complexity.
Expanding for each element is a good way to start though, and you can always change your strategy later.
(Get the simple way working first, get fancy if you need to.)
It's probably easiest to first implement an array of int (or some other basic numerical type) and make sure that it works, and then change the type of the contents.

I suppose by "Products *products;" you mean "Products" is a vector-like container.
1) How can I write it without vectors?
As a linked list. Instantiating a "Products products" will give you an empty linked list.
Overriding the operator[] will insert/replace the element in the list. So you need to scan the list to find the right place. If several elements are missing until you got the right place, you may need to append those "neutral" elements before your element. Doing so, through "Product *products" is not feasible if you plan to override the operator[] to handle addition of elements, unless you declare "Products products" instead
2) Do I have to use double pointers (2-dimension)?
This question lacks of precision. As in "typedef Product *Products;" then "Products *products" ? as long as you maintained a " * " between "Products" and "products", there is no way to override operator[] to handle addition of element.
3) Every time I want to add a new object, do I have to copy the array to the new array (with +1 size), and then delete the array?
If you stick with array, you can use a O(log2(n)) time reallocation, by simply growing twice the array size (and supposedly you have a zero-terminal or a count embedded). Or just use a linked list instead to avoid any copy of all elements before adding an element.

C++11 unordered_map time complexity

I'm trying to figure out the best way to do a cache for resources. I am mainly looking for native C/C++/C++11 solutions (i.e. I don't have boost and the likes as an option).
What I am doing when retrieving from the cache is something like this:
Object *ResourceManager::object_named(const char *name) {
if (_object_cache.find(name) == _object_cache.end()) {
_object_cache[name] = new Object();
}
return _object_cache[name];
}
Where _object_cache is defined something like: std::unordered_map <std::string, Object *> _object_cache;
What I am wondering is about the time complexity of doing this, does find trigger a linear-time search or is it done as some kind of a look-up operation?
I mean if I do _object_cache["something"]; on the given example it will either return the object or if it doesn't exist it will call the default constructor inserting an object which is not what I want. I find this a bit counter-intuitive, I would have expected it to report in some way (returning nullptr for example) that a value for the key couldn't be retrieved, not second-guess what I wanted.
But again, if I do a find on the key, does it trigger a big search which in fact will run in linear time (since the key will not be found it will look at every key)?
Is this a good way to do it, or does anyone have some suggestions, perhaps it's possible to use a look up or something to know if the key is available or not, I may access often and if it is the case that some time is spent searching I would like to eliminate it or at least do it as fast as possible.
Thankful for any input on this.

The default constructor (triggered by _object_cache["something"]) is what you want; the default constructor for a pointer type (e.g. Object *) gives nullptr (8.5p6b1, footnote 103).
So:
auto &ptr = _object_cache[name];
if (!ptr) ptr = new Object;
return ptr;
You use a reference into the unordered map (auto &ptr) as your local variable so that you assign into the map and set your return value in the same operation. In C++03 or if you want to be explicit, write Object *&ptr (a reference to a pointer).
Note that you should probably be using unique_ptr rather than a raw pointer to ensure that your cache manages ownership.
By the way, find has the same performance as operator[]; average constant, worst-case linear (only if every key in the unordered map has the same hash).

Here's how I'd write this:
auto it = _object_cache.find(name);
return it != _object_cache.end()
? it->second
: _object_cache.emplace(name, new Object).first->second;

The complexity of find on an std::unordered_map is O(1) (constant), specially with std::string keys which have good hashing leading to very low rate of collisions. Even though the name of the method is find, it doesn't do a linear scan as you pointed out.
If you want to do some kind of caching, this container is definitely a good start.

Note that a cache typically is not just a fast O(1) access but also a bounded data structure. The std::unordered_map will dynamically increase its size when more and more elements are added. When resources are limited (e.g. reading huge files from disk into memory), you want a bounded and fast data structure to improve the responsiveness of your system.
In contrast, a cache will use an eviction strategy whenever size() reaches capacity(), by replacing the least valuable element.
You can implement a cache on top of a std::unordered_map. The eviction strategy can then be implemented by redefining the insert() member. If you want to go for an N-way (for small and fixed N) associative cache (i.e. one item can replace at most N other items), you could use the bucket() interface to replace one of the bucket's entries.
For a fully associative cache (i.e. any item can replace any other item), you could use a Least Recently Used eviction strategy by adding a std::list as a secondary data structure:
using key_tracker_type = std::list<K>;
using key_to_value_type = std::unordered_map<
K,std::pair<V,typename key_tracker_type::iterator>
>;
By wrapping these two structures inside your cache class, you can define the insert() to trigger a replace when your capacity is full. When that happens, you pop_front() the Least Recently Used item and push_back() the current item into the list.
On Tim Day's blog there is an extensive example with full source code that implements the above cache data structure. It's implementation can also be done efficiently using Boost.Bimap or Boost.MultiIndex.

The insert/emplace interfaces to map/unordered_map are enough to do what you want: find the position, and insert if necessary. Since the mapped values here are pointers, ekatmur's response is ideal. If your values are fully-fledged objects in the map rather than pointers, you could use something like this:
Object& ResourceManager::object_named(const char *name, const Object& initialValue) {
return _object_cache.emplace(name, initialValue).first->second;
}
The values name and initialValue make up arguments to the key-value pair that needs to be inserted, if there is no key with the same value as name. The emplace returns a pair, with second indicating whether anything was inserted (the key in name is a new one) - we don't care about that here; and first being the iterator pointing to the (perhaps newly created) key-value pair entry with key equivalent to the value of name. So if the key was already there, dereferencing first gives the original Ojbect for the key, which has not been overwritten with initialValue; otherwise, the key was newly inserted using the value of name and the entry's value portion copied from initialValue, and first points to that.
ekatmur's response is equivalent to this:
Object& ResourceManager::object_named(const char *name) {
bool res;
auto iter = _object_cache.end();
std::tie(iter, res) = _object_cache.emplace(name, nullptr);
if (res) {
iter->second = new Object(); // we inserted a null pointer - now replace it
}
return iter->second;
}
but profits from the fact that the default-constructed pointer value created by operator[] is null to decide whether a new Object needs to be allocated. It's more succinct and easier to read.

C++ Deleting objects from memory

Lets say I have allocated some memory and have filled it with a set of objects of the same type, we'll call these components.
Say one of these components needs to be removed, what is a good way of doing this such that the "hole" created by the component can be tested for and skipped by a loop iterating over the set of objects?
The inverse should also be true, I would like to be able to test for a hole in order to store new components in the space.
I'm thinking menclear & checking for 0...

boost::optional<component> seems to fit your needs exactly. Put those in your storage, whatever that happens to be. For example, with std::vector
// initialize the vector with 100 non-components
std::vector<boost::optional<component>> components(100);
// adding a component at position 15
components[15].reset(component(x,y,z));
// deleting a component at position 82
componetnts[82].reset()
// looping through and checking for existence
for (auto& opt : components)
{
if (opt) // component exists
{
operate_on_component(*opt);
}
else // component does not exist
{
// whatever
}
}
// move components to the front, non-components to the back
std::parition(components.begin(), components.end(),
[](boost::optional<component> const& opt) -> bool { return opt; });

The short answer is it depends on how you store it in memmory.
For example, the ansi standard suggests that vectors be allocated contiguously.
If you can predict the size of the object, you may be able to use a function such as size_of and addressing to be able to predict the location in memory.
Good luck.

There are at least two solutions:
1) mark hole with some flag and then skip it when processing. Benefit: 'deletion' is very fast (only set a flag). If object is not that small even adding a "bool alive" flag can be not so hard to do.
2) move a hole at the end of the pool and replace it with some 'alive' object.
this problem is related to storing and processing particle systems, you could find some suggestions there.

If it is not possible to move the "live" components up, or reorder them such that there is no hole in the middle of the sequence, then the best option if to give the component objects a "deleted" flag/state that can be tested through a member function.
Such a "deleted" state does not cause the object to be removed from memory (that is just not possible in the middle of a larger block), but it does make it possible to mark the spot as not being in use for a component.

When you say you have "allocated some memory" you are likely talking about an array. Arrays are great because they have virtually no overhead and extremely fast access by index. But the bad thing about arrays is that they aren't very friendly for resizing. When you remove an element in the middle, all following elements have to be shifted back by one position.
But fortunately there are other data structures you can use, like a linked list or a binary tree, which allow quick removal of elements. C++ even implements these in the container classes std::list and std::set.
A list is great when you don't know beforehand how many elements you need, because it can shrink and grow dynamically without wasting any memory when you remove or add any elements. Also, adding and removing elements is very fast, no matter if you insert them at the beginning, in the end, or even somewhere in the middle.
A set is great for quick lookup. When you have an object and you want to know if it's already in the set, checking it is very quick. A set also automatically discards duplicates which is really useful in many situations (when you need duplicates, there is the std::multiset). Just like a list it adapts dynamically, but adding new objects isn't as fast as in a list (not as expensive as in an array, though).

Two suggestions:
1) You can use a Linked List to store your components, and then not worry about holes.
Or if you need these holes:
2) You can wrap your component into an object with a pointer to the component like so:
class ComponentWrap : public
{
Component component;
}
and use ComponentWrap.component == null to find if the component is deleted.
Exception way:
3) Put your code in a try catch block in case you hit a null pointer error.

Deleting user-defined elements in the middle of a vector

I'm coding a program where I want to draw a card, and then delete so that it doesn't get drawn again.
I have a vector of Cards (class containing 2 structs that define Suit and Value) called deck and I don't really know how to use iterators very well, here a code snippet:
void Player::discardCard(CardDeck masterDeck)
{
cout << "Erasing: " << masterDeck.getDeck().at(cardSelect).toString() << endl;
/*Attempt1*/
masterDeck.getDeck().erase(masterDeck.getDeck().begin()+cardSelect);
/*Attempt 2*/
vector<Card>::iterator itr;
itr = masterDeck.getDeck().begin() + cardSelect;
masterDeck.getDeck().erase(itr);
}
cardSelect has the location of the card I'm going to delete.
It's generated randomly within the boundaries of 0 and the size of deck; therefore it shouldn't be pointing to a position out of boundaries.
Everytime I compile I get the following error:
"Expression: vector erase iterator outside range"
I really don't know what to do, hopefully someonw can help me, thanks in advance!

My bet is that getDeck returns the vector by value. It causes itr to point to and erase to operate on different copies of the vector. Thus you get the error. You should return the vector by reference. Change getDeck signature to this one:
vector<Card>& getDeck()

Let me go off topic first. Your design is a little suspect. First passing in CardDeck by value is almost certainly not what you want but that's even beside the point. Why should your Player class have all this inside knowledge about the private innards of CardDeck. It shouldn't care that you store the deck as a vector or deque (ha ha), or what the structure is. It just shouldn't know that. All it knows is it wants to discard a card.
masterDeck.Discard(selectedCard);
Also note that selectedCard has to be between 0 and ONE LESS than the size of the deck, but even that's probably not your problem (although it will be 1/53rd of the time)
So to answer your question we really would need to now a little more about masterDeck. Did you implement a valid custom copy constructor? Since you're passing by value odds are good you're not correctly copying the underlying vector, in fact it's probably empty and none of the deletes will work. Try checking the size. If you don't ever want the deck copied then you can let the compiler help you by declaring a private copy constructor and then never defining it. See Scott Meyer's Effective C++ Item 11.
Finally one last piece of advice, I believe once you erase with your iterator, you invalidate it. The vector might get reallocated (almost certainly will if you erase anywhere but the end). I'm just telling you so that you don't try to call erase more than once on the same iterator. One of the tricky things about iterators is how easy it can be to invalidate them, which is why you often seen checks for iter != coll.end().

"It's generated randomly within the boundaries of 0 and the size of deck".
The valid range should be "between 0 and the size of the deck minus 1". This could generate range error at run time.

STL sorted set where the conditions of order may change

I have a C++ STL set with a custom ordering defined.
The idea was that when items get added to the set, they're naturally ordered as I want them.
However, what I've just realised is that the ordering predicate can change as time goes by.
Presumably, the items in the set will then no longer be in order.
So two questions really:
Is it harmful that the items would then be out of order? Am I right in saying that the worst that can happen is that new entries may get put into the wrong place (which actually I can live with). Or, could this cause crashes, lost entries etc?
Is there a way to "refresh" the ordering of the set? You can't seem to use std::sort() on a set. The best I can come up with is dumping out the contents to a temp container and re-add them.
Any ideas?
Thanks,
John

set uses the ordering to lookup items. If you would insert N items according to ordering1 and insert an item according to ordering2, the set cannot find out if the item is already in.
It will violate the class invariant that every item is in there only once.
So it does harm.

The only safe way to do this with the STL is to create a new set with the changed predicate. For example you could do something like this when you needed to sort the set with a new predicate:
std::set<int> newset( oldset.begin(), oldset.end(), NewPred() );

This is actually implementation dependent.
The STL implementation can and usually will assumes the predicate used for sorting is stable (otherwise, "sorted" would not be defined). It is at least possible to construct a valid STL implementation that formats your hard drive when you change the behavior of the predicate instance.
So, yes, you need to re-insert the items into a new set.
Alternatively, you could construct your own container, e.g. a vector + sort + lower_bound for binary search. Then you could re-sort when the predicates behavior changes.

I agree with the other answers, that this is going to break in some strange and hard to debug ways. If you go the refresh route, you only need to do the copy once. Create a tmp set with the new sorting strategy, add each element from the original set to the tmp set, then do
orig.swap(tmp);
This will swap the internals of the sets.
If this were me, I would wrap this up in a new class that handles all of the details, so that you can change implementations as needed. Depending on your access patterns and the number of times the sort order changes, the previously mentioned vector, sort, lowerbound solution may be preferable.

If you can live with an unordered set, then why are you adding them into a set in the first place?
The only case I can think of is where you just want to make sure the list is unique when you add them. If that's the case then you could use a temporary set to protect additions:
if (ts.insert (value).second) {
// insertion took place
realContainer.push_back (value);
}
An alternative, is that depending on how frequently you'll be modifying the entries in the set, you can probably test to see if the entry will be in a different location (by using the set compare functionality) and where the position will move then remove the old entry and re-add the new one.
As everyone else has pointed out - having the set unordered really smells bad - and I would also guess that its possible got undefined behaviour according to the std.

While this doesn't give you exactly what you want, boost::multi_index gives you similar functionality. Due to the way templates work, you will never be able to "change" the ordering predicate for a container, it is set in stone at compile time, unless you are using a sorted vector or something similar, to where you are the one maintaining the invariant, and you can sort it however you want at any given time.
Multi_index however gives you a way to order a set of elements based on multiple ordering predicates at the same time. You can then select views of the container that behave like an std::set ordered by the predicate that you care about at the time.

This can cause lost entries, when searching for an element in a set the ordering operator is used this means that if an element was placed to the left of the root and now the ordering operator says it's to the right then that element will not longer be found.

Here's a simple test for you:
struct comparer : public std::binary_function<int, int, bool>
{
static enum CompareType {CT_LESS, CT_GREATER} CompareMode;
bool operator()(int lhs, int rhs) const
{
if(CompareMode == CT_LESS)
{
return lhs < rhs;
}
else
{
return lhs > rhs;
}
}
};
comparer::CompareType comparer::CompareMode = comparer::CT_LESS;
typedef std::set<int, comparer> is_compare_t;
void check(const is_compare_t &is, int v)
{
is_compare_t::const_iterator it = is.find(v);
if(it != is.end())
{
std::cout << "HAS " << v << std::endl;
}
else
{
std::cout << "ERROR NO " << v << std::endl;
}
}
int main()
{
is_compare_t is;
is.insert(20);
is.insert(5);
check(is, 5);
comparer::CompareMode = comparer::CT_GREATER;
check(is, 5);
is.insert(27);
check(is, 27);
comparer::CompareMode = comparer::CT_LESS;
check(is, 5);
check(is, 27);
return 0;
}
So, basically if you intend to be able to find the elements you once inserted you should not change the predicate used for insertions and find.

Just a follow up:
While running this code the Visual Studio C debug libraries started throwing exceptions complaining that the "<" operator was invalid.
So, it does seem that changing the sort ordering is a bad thing. Thanks everyone!

1) Harmful - no. Result in crashes - no. The worst is indeed a non-sorted set.
2) "Refreshing" would be the same as re-adding anyway!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ Growable, Tightly Packed Data Structure with fixed indices - c++

Your idea to keep track of deleted ("dirty" in your parlance) entries is commonly done, and referred to as using a "free list." Another idea would be to simply use the pointer (address of each datum) as the ID.

I'm late the party, but what about swapping the last element with the element to be deleted and popping that element off the back of the vector? Free lists are also a good solution, but might be over engineering it depending on what you're doing.

Related

Create dynamic array of objects

C++11 unordered_map time complexity

C++ Deleting objects from memory

Deleting user-defined elements in the middle of a vector

STL sorted set where the conditions of order may change

Categories

Resources