Tracking elements for order in a std::map - c++

I have a std::map like so: std::map<UINT32,USER_DEFINED_X> with a variable number N of elements. This map is part of an overall application that runs on a real time framework. The map contains elements such that it includes times for when certain activities are supposed to occur. During each frame, the map is scanned to see if any of those times match up with current time. There is one condition that needs to be checked though before processing the activities. I need to check to see if the element that is going to be processed is the first one in the list that is being processed. I am not sure how to do that. One approach I thought about using was to create another temporary map/array where I would store the element that has been processed in order, then get the order from that temporary array/map?
Does anybody know of a better way I can conduct this operation?

Related

Moving values between lockfree lists

Background
I am trying to design and implement lock-free hashmap using chaining method in C++. Each hash table cell is supposed to contain lockfree list. To enable resizing, my data structure is supposed to contain two arrays - small one which is always available and a bigger one for resizing, when the smaller one is no longer sufficient. When the bigger one is created I would like the data stored in small one to be transfered to bigger one by one, whenever any thread does something with the data structure (adds element, searches or removes one). When all data is transfered, the bigger array is moved in place of smaller and the latter one is deleted. The cycle repeats whenever the array needs to be enlarged.
Problem
As mentioned before, each array is supposed to conatin lists in cells. I am trying to find a way to transfer a value or node from one lockfree list to another in such a manner that would keep value visible in any (or both) of the lists. It is needed to ensure that search in hash map won't give the user false negatives. So my questions are:
Is such lockfree list implementation possible?
If so, what would be the general concept of such list and "moving node/value" operation? I would be thankful for any pseudocode, C++ code or scientific article describing it.
To be able to resize the array, while maintaining the lock-free progress guarantees, you will need to use operation descriptors. Once the resize starts, add a descriptor that contains references to the old and the new arrays.
On any operation (add, search, or remove):
Add operation, search the old array, if the element already exists, then move the element to the new array before returning. Indicate, with a descriptor or a special null value that the element has already been moved so that other threads don't attempt the move again
Search, search the old array and move the element as indicated above.
Remove - Remove too will have to search the old array first.
Now the problem is that you will have a thread that has to verify that the move is complete, so that you can remove the descriptor and free up the old array. To maintain lock-freedom, you will need to have all active threads attempt to do this validation, thus it becomes very expensive.
You can look at:
https://dl.acm.org/citation.cfm?id=2611495
https://dl.acm.org/citation.cfm?id=3210408

Efficiently processing large number of unique elements (std::set vs other containers)

I have std::set having large number unique objects as its elements.
In the main thread of program:
I take some objects from the set
Assign data to be processed to each of them
Remove those objects from set
And finally pass the objects to threads in threadpool for processing
Once those threads finish processing objects, they adds them back to the set. (So that in the next iteration, main thread can again
assigns next batch of data to those objects for processing)
This arrangement works perfect. But if I encounter error while adding back object to the set (for example, std::set.insert() throws bad_alloc) then it all goes on toss.
If I ignore that error and proceed, then there is no way for the object to get back in the processing set and it remains out of the program flow forever causing memory leaks.
To address this issue I tried to not to remove object from set. Instead, have a member flag that indicates the object is 'being processed'. But in that case the problem is, main thread encounters 'being processed' objects again and again while iterating through all elements of set. And it badly hampers performance (Number of objects in set are quite large).
What are better alternatives here?
Can std::list be used instead of std::set? List will not have bad_alloc problem while adding back element, as it just needs to assign pointers while adding element to list. But how can we make list elements unique? If at all we achieve it, will it be efficient as std::set?
Instead of removing and adding back elements to the std::set, is there any way to move element to the start or end of the set? So that unprocessed objects and processed will accumulate together towards start and end of the set.
Any other solution please?

Implementing a mutable ranking table in c++

In an event-driven simulator, I need to keep track of the popularity of a large number of content elements from a catalog. More specifically I am interested in knowing the rank of any given content, i.e. its position in a list sorted by descending number of requests. I know that the number of requests per content is only going to be increased by one each time, so there is no dramatic shift in the ranking. Furthermore, elements are inserted or deleted from the catalog only in rare occasions, while requests are much more numerous and frequent. What is the best data structure to implement this?
These are the options that I have considered:
a std::map<ContentElement, unsigned int> mapping contents to the number of requests they received. Not a good choice, as it requires me to dump everything to a list and sort it whenever I want to know the ranking of a content, which is very often.
a boost::multi_index_container with two indexes, a hashed_unique for the ContentElement and an ordered_not_unique for the number of requests. This allows me to quickly retrieve a content in order to update its number of requests and to keep the container sorted as I do this through a modify call, but my understanding of the ordered index is that it still forces me to iterate through all its element in order to figure the rank of a content - I could not figure a simple way of extracting the position in the ranking from the ordered iterator.
a boost::bimap between content elements and ranking position, supported by an external sorted vector storing the number of requests per content. Essentially the rank of a content would also represent the index of the vector element with its number of requests. This allows me to do everything I want to do (e.g., easily go from content to rank and viceversa) and sorting the vector after a new request should require at most two swaps in the bimap. However it feels clumsy and error-prone as I could easily loose sync between the map and the vector and then everything would fall apart.
My guts tell me there must be a much simpler and more elegant way of handling this, but I could not find it. Can anyone help?
There is no need to do a full sort. The key insight here is that a ranking can only change by +1 or -1 when it is accessed. I would do the following...
Keep the element in a container of your choice, e.g.
map< elementId, elementInstance >
Maintain a linked list of element rankings, something like this:
list< rankingInstance >
The rankingInstance has a pointer to an elementInstance and the value of the current rank and current number of accesses. On access, you:
access the element in the map
get its current rank, and access count
update the count
using the current rank, access the linked list
check the neighbors
swap position in list if necessary
if swapping occurred, go back and update the two elements whose rank changed
It may seem so simple, but my suggestion is to use Bubble Sort on your list. Since, Bubble Sort compares and switches only the adjacent elements, which is your case, simply one up or one down move in the ranking. Your vector may keep the 'Rank' as key, 'ContentHash' as value in a vector. A map containing 'Content' or 'Content Reference' will also needed. I hope this very simple approach gives some insights about your problem.

How to repeatedly insert elements into a sorted list fast

I do not have formal CS training, so bear with me.
I need to do a simulation, which can abstracted away to the following (omitting the details):
We have a list of real numbers representing the times of events. In
each step, we
remove the first event, and
as a result of "processing" it, a few other events may get inserted into the list at a strictly later time
and repeat this many times.
Questions
What data structure / algorithm can I use to implement this as efficiently as possible? I need to increase the number of events/numbers in the list significantly. The priority is to make this as fast as possible for a long list.
Since I'm doing this in C++, what data structures are already available in the STL or boost that will make it simple to implement this?
More details:
The number of events in the list is variable, but it's guaranteed to be between n and 2*n where n is some simulation parameter. While the event times are increasing, the time-difference of the latest and earliest events is also guaranteed to be less than a constant T. Finally, I suspect that the density of events in time, while not constant, also has an upper and lower bound (i.e. all the events will never be strongly clustered around a single point in time)
Efforts so far:
As the title of the question says, I was thinking of using a sorted list of numbers. If I use a linked list for constant time insertion, then I have trouble finding the position where to insert new events in a fast (sublinear) way.
Right now I am using an approximation where I divide time into buckets, and keep track of how many event are there in each bucket. Then process the buckets one-by-one as time "passes", always adding a new bucket at the end when removing one from the front, thus keeping the number of buckets constant. This is fast, but only an approximation.
A min-heap might suit your needs. There's an explanation here and I think STL provides the priority_queue for you.
Insertion time is O(log N), removal is O(log N)
It sounds like you need/want a priority queue. If memory serves, the priority queue adapter in the standard library is written to retrieve the largest items instead of the smallest, so you'll have to specify that it use std::greater for comparison.
Other than that, it provides just about exactly what you've asked for: the ability to quickly access/remove the smallest/largest item, and the ability to insert new items quickly. While it doesn't maintain all the items in order, it does maintain enough order that it can still find/remove the one smallest (or largest) item quickly.
I would start with a basic priority queue, and see if that's fast enough.
If not, then you can look at writing something custom.
http://en.wikipedia.org/wiki/Priority_queue
A binary tree is always sorted and has faster access times than a linear list. Search, insert and delete times are O(log(n)).
But it depends whether the items have to be sorted all the time, or only after the process is finished. In the latter case a hash table is probably faster. At the end of the process you then would copy the items to an array or a list and sort it.

Efficient way to organize used and unused elements in a large concurrent array

I have about 18 million elements in an array that are initialized and ready to be used by a simple manager called ElementManager (this number will later climb to a little more than a billion in later iterations of the program). A class, A, which must use the elements communicates with ElementManager that returns the next available element for consumption. That element is now in use and cannot be reused until recycled, which may happen often. Class A is concurrent, that is, it can ask ElementManager for an available element in several threads. The elements in this case is an object that stores three vertices to make a triangle.
Currently, the ElementManager is using Intel TBB concurrent_bounded_queue called mAllAvailableElements. There is also another container (a TBB concurrent_vector) that contains all elements, regardless of whether they are available for use or not, called mAllElements. Class A asks for the next available element, the manager tries to pop the next available element from the queue. The popped element is now in use.
Now when class A has done what it has to do, control is handed to class B which now has to iterate through all elements that are in use and create meshes (to take advantage of concurrency, the array is split into several smaller arrays to create submeshes which scales with the number of available threads - the reason for this is that creating a mesh must be done serially). For this I am currently iterating over the container mAllElements (this is also concurrent) and grabbing any element that is in use. The elements, as mentioned above, contain polygonal information to create meshes. Iteration in this case takes a long time as it has to check each element and query whether it is in use or not, because if it is not in use then it should not be part of a mesh.
Now imagine if only 1 million out of the possible 18 million elements were in use (but more than 5-6 million were recycled). Worse yet, due to constant updates to only part of the mesh (which happens concurrently) means the in use elements are fragmented throughout the mAllElements container.
I thought about this for quite some time now and one flawed solution that I came up with was to create another queue of elements named mElementsInUse, which is also a concurrent_queue. I can push any element that is now in use. Problem with this approach is that since it is a queue, any element in that queue can be recycled at any time (an update in a part of the mesh) and declared not in use and since I can only pop the front element, this approach fails. The only other approach I can think of is to defragment the concurrent_vector mAllElements every once in a while when no operations are taking place.
I think my approach to this problem is wrong and thus my post here. I hope I explained the problem in enough detail. It seems like a common memory management problem, but I cannot come up with any search terms to search for it.
How about using a bit vector to indicate which of your elements are in use? It's easy to partition it for parallel processing when building your full mesh, and you can use atomic operations on words in the vector and thus avoid locks.