Moving values between lockfree lists - concurrency

Background
I am trying to design and implement lock-free hashmap using chaining method in C++. Each hash table cell is supposed to contain lockfree list. To enable resizing, my data structure is supposed to contain two arrays - small one which is always available and a bigger one for resizing, when the smaller one is no longer sufficient. When the bigger one is created I would like the data stored in small one to be transfered to bigger one by one, whenever any thread does something with the data structure (adds element, searches or removes one). When all data is transfered, the bigger array is moved in place of smaller and the latter one is deleted. The cycle repeats whenever the array needs to be enlarged.
Problem
As mentioned before, each array is supposed to conatin lists in cells. I am trying to find a way to transfer a value or node from one lockfree list to another in such a manner that would keep value visible in any (or both) of the lists. It is needed to ensure that search in hash map won't give the user false negatives. So my questions are:
Is such lockfree list implementation possible?
If so, what would be the general concept of such list and "moving node/value" operation? I would be thankful for any pseudocode, C++ code or scientific article describing it.

To be able to resize the array, while maintaining the lock-free progress guarantees, you will need to use operation descriptors. Once the resize starts, add a descriptor that contains references to the old and the new arrays.
On any operation (add, search, or remove):
Add operation, search the old array, if the element already exists, then move the element to the new array before returning. Indicate, with a descriptor or a special null value that the element has already been moved so that other threads don't attempt the move again
Search, search the old array and move the element as indicated above.
Remove - Remove too will have to search the old array first.
Now the problem is that you will have a thread that has to verify that the move is complete, so that you can remove the descriptor and free up the old array. To maintain lock-freedom, you will need to have all active threads attempt to do this validation, thus it becomes very expensive.
You can look at:
https://dl.acm.org/citation.cfm?id=2611495
https://dl.acm.org/citation.cfm?id=3210408

Related

Advice on how to implement std::vector using a circular array?

I am to write a C++ program that :
"Implements the vector ADT by means of an extendable array used in a circular fashion, so that insertions and deletions at the beginning and end run in constant time. (So not O(n)). Print the circular array before and after each insertion and deletion, You cannot use the STL."
This task seems very confusing to me. A std::vector is implemented using a dynamic array that is based off the concept of a stack, correct? Performing a deletion or insertion at the front seems to me that this should be implemented as a Queue or maybe a Dequeue, not a Vector. Also, a circular array would mean that when data is pushed onto an array that is Full, old data becomes overwritten, right? So when should I know to expand the vector's capacity?
If I'm not making sense here, Basically I need help in understanding how I should go about implementing a dynamic circular array..
Yes, this is a homework assignment. No, I do not expect anyone to provide code for me, I only wish for someone to give me a push in the right direction as to how I should think about implementing this. Thank you.
I think you are actually being asked to implement deque. The point of the "circularity" is that in normal vector you cannot add an element at the beginning since there is no free space and you would have to move all other elements to the right. So what you can do is you simulate a circle by putting the element to the end the base array and remember that's where the first element is.
Example: 2, 3, -, -, 1 where 1 is first and 3 is last
So, basically you insert elements circullary, and remember where the first and the last elements are so you can add to beginning/end in O(1). Also when the array is full, you have to move all the elements to a larger one. If you double the size, you still get amortized time of O(1)
1) m_nextIn and m_nextOut - data attributes of class queue;
I find it useful to have two integers, with label m_nextIn and m_nextOut ... these identify where in the circular array you 'insert' the next (i.e. youngest) obj instance into the queue, and where you 'delete' the oldest obj instance from the queue.
These two items also provide constant time insert and delete.
Don't get confused as to where the beginning or end of the queue is. The array starts at index 0, but this is not the beginning of your queue.
The beginning of your queue is at nextIn (which probably is not 0, but may be). Technique also known as round-robin (a research term).
2) empty and full - method attributes
Determining queue full / empty can be easily computed from m_nextIn and m_nextOut.
3) extendable
Since you are prohibited from using vector (which itself is extendable) you must implement this functionality yourself.
Note about your comment: The "dynamic memory" concept is not related to stack. (another research term)
Extendable issues occur when your user code invokes the 'insert' AND the array is already full. (capture this test effort) You will need to detect this issue, then do 4 things:
3.1) allocate a new array (use new, and simply pick an appropriate size.)
Hint - std::vector() doubles it's capacity each time a push_back() would overflow the current capacity
3.2) transfer the entire contents of the array to the new array, fixing all the index's as you go. Since the new array is bigger, just insert trivially.
3.3) delete the old array - i.e. you copied from the old array to the new array, so do you 'delete' them? or simply delete the array?
3.4) finish the 'insert' - you were in the middle of inserting another instance, right?
Good luck.

Efficiently processing large number of unique elements (std::set vs other containers)

I have std::set having large number unique objects as its elements.
In the main thread of program:
I take some objects from the set
Assign data to be processed to each of them
Remove those objects from set
And finally pass the objects to threads in threadpool for processing
Once those threads finish processing objects, they adds them back to the set. (So that in the next iteration, main thread can again
assigns next batch of data to those objects for processing)
This arrangement works perfect. But if I encounter error while adding back object to the set (for example, std::set.insert() throws bad_alloc) then it all goes on toss.
If I ignore that error and proceed, then there is no way for the object to get back in the processing set and it remains out of the program flow forever causing memory leaks.
To address this issue I tried to not to remove object from set. Instead, have a member flag that indicates the object is 'being processed'. But in that case the problem is, main thread encounters 'being processed' objects again and again while iterating through all elements of set. And it badly hampers performance (Number of objects in set are quite large).
What are better alternatives here?
Can std::list be used instead of std::set? List will not have bad_alloc problem while adding back element, as it just needs to assign pointers while adding element to list. But how can we make list elements unique? If at all we achieve it, will it be efficient as std::set?
Instead of removing and adding back elements to the std::set, is there any way to move element to the start or end of the set? So that unprocessed objects and processed will accumulate together towards start and end of the set.
Any other solution please?

C++ Deleting objects from memory

Lets say I have allocated some memory and have filled it with a set of objects of the same type, we'll call these components.
Say one of these components needs to be removed, what is a good way of doing this such that the "hole" created by the component can be tested for and skipped by a loop iterating over the set of objects?
The inverse should also be true, I would like to be able to test for a hole in order to store new components in the space.
I'm thinking menclear & checking for 0...
boost::optional<component> seems to fit your needs exactly. Put those in your storage, whatever that happens to be. For example, with std::vector
// initialize the vector with 100 non-components
std::vector<boost::optional<component>> components(100);
// adding a component at position 15
components[15].reset(component(x,y,z));
// deleting a component at position 82
componetnts[82].reset()
// looping through and checking for existence
for (auto& opt : components)
{
if (opt) // component exists
{
operate_on_component(*opt);
}
else // component does not exist
{
// whatever
}
}
// move components to the front, non-components to the back
std::parition(components.begin(), components.end(),
[](boost::optional<component> const& opt) -> bool { return opt; });
The short answer is it depends on how you store it in memmory.
For example, the ansi standard suggests that vectors be allocated contiguously.
If you can predict the size of the object, you may be able to use a function such as size_of and addressing to be able to predict the location in memory.
Good luck.
There are at least two solutions:
1) mark hole with some flag and then skip it when processing. Benefit: 'deletion' is very fast (only set a flag). If object is not that small even adding a "bool alive" flag can be not so hard to do.
2) move a hole at the end of the pool and replace it with some 'alive' object.
this problem is related to storing and processing particle systems, you could find some suggestions there.
If it is not possible to move the "live" components up, or reorder them such that there is no hole in the middle of the sequence, then the best option if to give the component objects a "deleted" flag/state that can be tested through a member function.
Such a "deleted" state does not cause the object to be removed from memory (that is just not possible in the middle of a larger block), but it does make it possible to mark the spot as not being in use for a component.
When you say you have "allocated some memory" you are likely talking about an array. Arrays are great because they have virtually no overhead and extremely fast access by index. But the bad thing about arrays is that they aren't very friendly for resizing. When you remove an element in the middle, all following elements have to be shifted back by one position.
But fortunately there are other data structures you can use, like a linked list or a binary tree, which allow quick removal of elements. C++ even implements these in the container classes std::list and std::set.
A list is great when you don't know beforehand how many elements you need, because it can shrink and grow dynamically without wasting any memory when you remove or add any elements. Also, adding and removing elements is very fast, no matter if you insert them at the beginning, in the end, or even somewhere in the middle.
A set is great for quick lookup. When you have an object and you want to know if it's already in the set, checking it is very quick. A set also automatically discards duplicates which is really useful in many situations (when you need duplicates, there is the std::multiset). Just like a list it adapts dynamically, but adding new objects isn't as fast as in a list (not as expensive as in an array, though).
Two suggestions:
1) You can use a Linked List to store your components, and then not worry about holes.
Or if you need these holes:
2) You can wrap your component into an object with a pointer to the component like so:
class ComponentWrap : public
{
Component component;
}
and use ComponentWrap.component == null to find if the component is deleted.
Exception way:
3) Put your code in a try catch block in case you hit a null pointer error.

Determining the best ADT for a priority queue with changeable elements (C++)

First post here and I'm a beginner - hope I'm making myself useful...
I'm trying to find and understand the ADT/concept that does the job I'm after. I'm guessing it's already out there.
I have an array/list/tree (container to be decided) of objects each of which has a count associated with how much it hasn't been used over iterations of a process. As iterations proceed the count for each object accumulates by 1. The idea is that sooner or later I'm going to need the memory that any unused objects are using so I'll delete them to make space for an object not in RAM (which will have an initial count of '0') - But, if it turns out that I use an object that is still in memory it's count is reset to '0', and I pat myself on the back for not having had to access the disk for its contents.
A cache?
The main process loop would have something similar to the following in it:
if (object needs to be added && (totalNumberOfObjects > someConstant))
object with highest count deleted from RAM and the (heap??)
newObject added with a count of '0'
if (an object already in RAM is accessed by the process)
accessedObject count is set to '0'
for (All objects in RAM)
count++
I could bash about for a (long and buggy time) and build my own mess, but I thought it'd be interesting to learn the most efficient way from word go.
Something like a heap?
You could use a heap for this, but I think it would be overkill. It sounds like you're not going to have a lot of different values for the counts, and you'll have a lot of objects with each count. If that's true, then you only need thread the objects onto a list of objects with the same count. These lists are themselves arranged in a dequeue (or 'deque' as C++ insists on calling it).
The key here is that you need to increment the count of all objects, and presumably you want that to be O(1) if possible, rather than O(N). And it is possible: the key is that each list's header contains also the difference of its count from the next smaller count. The header of the list with the smallest count contains a delta from 0, which is the smallest count. To increment the count of all objects, you only have to increase this single number by one.
To set an object's count to 0, you remove the object from its list (which means you always need to refer to objects by their list iterator, or you need to implement your own intrusive linked list), and either (1) add it to the bottom list, if that list has a count of 0, or (2) create a new bottom list with a count of 0 containing only that object.
The procedure for creating a new object is the same, except that you don't have to unlink it from its current list.
To evict an object from memory, you choose the object at the head of the top list (which is the list with the largest count). If that list becomes empty, you pop it off the dequeue. If you need more memory, you can repeat this operation.
So all operations, including "increment all counts", are O(1). Unfortunately, the storage overhead is two pointers per object, plus two pointers and an integer per unique count (at worst, this is the same as the number of objects, but presumably in practice it's much less). Since it's hard to imagine any other algorithm which uses less than one pointer plus a count for each object, this is probably not even a space-time tradeoff; the additional space requirements are minimal.

Efficient way to organize used and unused elements in a large concurrent array

I have about 18 million elements in an array that are initialized and ready to be used by a simple manager called ElementManager (this number will later climb to a little more than a billion in later iterations of the program). A class, A, which must use the elements communicates with ElementManager that returns the next available element for consumption. That element is now in use and cannot be reused until recycled, which may happen often. Class A is concurrent, that is, it can ask ElementManager for an available element in several threads. The elements in this case is an object that stores three vertices to make a triangle.
Currently, the ElementManager is using Intel TBB concurrent_bounded_queue called mAllAvailableElements. There is also another container (a TBB concurrent_vector) that contains all elements, regardless of whether they are available for use or not, called mAllElements. Class A asks for the next available element, the manager tries to pop the next available element from the queue. The popped element is now in use.
Now when class A has done what it has to do, control is handed to class B which now has to iterate through all elements that are in use and create meshes (to take advantage of concurrency, the array is split into several smaller arrays to create submeshes which scales with the number of available threads - the reason for this is that creating a mesh must be done serially). For this I am currently iterating over the container mAllElements (this is also concurrent) and grabbing any element that is in use. The elements, as mentioned above, contain polygonal information to create meshes. Iteration in this case takes a long time as it has to check each element and query whether it is in use or not, because if it is not in use then it should not be part of a mesh.
Now imagine if only 1 million out of the possible 18 million elements were in use (but more than 5-6 million were recycled). Worse yet, due to constant updates to only part of the mesh (which happens concurrently) means the in use elements are fragmented throughout the mAllElements container.
I thought about this for quite some time now and one flawed solution that I came up with was to create another queue of elements named mElementsInUse, which is also a concurrent_queue. I can push any element that is now in use. Problem with this approach is that since it is a queue, any element in that queue can be recycled at any time (an update in a part of the mesh) and declared not in use and since I can only pop the front element, this approach fails. The only other approach I can think of is to defragment the concurrent_vector mAllElements every once in a while when no operations are taking place.
I think my approach to this problem is wrong and thus my post here. I hope I explained the problem in enough detail. It seems like a common memory management problem, but I cannot come up with any search terms to search for it.
How about using a bit vector to indicate which of your elements are in use? It's easy to partition it for parallel processing when building your full mesh, and you can use atomic operations on words in the vector and thus avoid locks.