C++ Fixed Size Container to Store Most Recent Values - c++

I would like to know what the most suitable data structure is for the following problem in C++
I am wanting to store 100 floats ordered by recency. So when I add (push) a new item the other elements are moved up one position. Every time an event is triggered I receive a value and then add it to my data structure.
When the number of elements reaches 100, I would like to remove (pop) the item at the end (the oldest).
I want to able to iterate over all the elements and perform some mathematical operations on them.
I have looked at all the standard C++ containers but none of them fulfill all my needs. What's the easiest way to achieve this with standard C++ code?

You want a circular buffer. You can use Boost's implementation or make your own by allocating an array, and keeping track of the beginning and end of the used range. This boils down to doing indexing modulo 100.

Without creating your own or using a library, std::vector is the most efficient standard data structure for this. Once it has reached its maximum size, there will be no more dynamic memory allocations. The cost of moving up 100 floats is trivial compared to the cost of dynamic memory allocations. (This is why std::list is a slow data structure for this). There is no push_front function for vector. Instead you have to use v.insert(v.begin(), f)
Of course this assumes what you are doing is performance-critical, which it probably isn't. In that case I would use std::deque for more convenient usage.

Just saw that you need to iterator over them. Use a list.
Your basic function would look something like this
void addToList(int value){
list100.push_back(value);
if(list100.size() > 100){
list100.pop_front();
}
}
Iterating over them is easy as well:
for(int val : list100){
sum += val;
}
// Average, or whatever you need to do
Obviously, if you're using something besides int, you'll need to change that. Although this adds a little bit more functionality than you need, it's very efficient since it's a doubly linked list.
http://www.cplusplus.com/reference/list/list/

You can use either std::array, std::dequeue, std::list or std::priority_queue

A MAP (std::map) should be able to solve your requirement. Use Key as the object and value as the current push number nPusheCount which gets incremented whenever you add an element to map.
when adding a new element to map, if you have less than 100 elements, just add the number to the MAP as key and nPushCount as the value.
If you have 100 elements already, check if the number exists in map already and do following:
If the number already exists in map, then add the number as key and nPushCount as value;
If doesnt, delete the number with lowest nPushCount as value and then add the desired number with updated nPushCount.

Related

C++ unordered_map or unordered_set : What to use if I wish to keep an "isVisited" data structure

I want to keep a data structure for storing all the elements that I have seen till now. Considering that keeping an array for this is out of question as elements can be of the order of 10^9, what data structure should I use for achieving this : unordered_map or unordered_set in C++ ?
Maximum elements that will be visited in worst case : 10^5
-10^9 <= element <= 10^9
As #MikeCAT said in the comments, a map would only make sense if you wanted to store additional information about the element or the visitation. But if you wanted only to store the truth value of whether the element has been visited or not, the map would look something like this:
// if your elements were strings
std::unordered_map<std::string, bool> isVisited;
and then this would just be a waste of space. Storing the truth value is redundant, if the mere presence of the string within the map already indicates that it has been visited. Let's see a comparison:
std::unordered_map<std::string, bool> isVisitedMap;
std::unordered_set<std::string> isVisitedSet;
// Visit some places
isVisitedMap["madrid"] = true;
isVisitedMap["london"] = true;
isVisitedSet.insert("madrid");
isVisitedSet.insert("london");
// Maybe the information expires so you want to remove them
isVisitedMap["london"] = false;
isVisitedSet.erase("london");
Now the elements stored in each structure will be:
For the map:
{{"london", false}, {"madrid", true}} <--- 4 elements
{"madrid"} <--- 1 element. Much better
In a project in which I had a binary tree converted to a binary DAG for optimization purposes (GRAPHGEN) I passed the exploration function a map from node pointers to bool:
std::map<BinaryDrag<conact>::node*, bool> &visited_fl
The map kept track of the pointers in order not to go through the same nodes again when doing multiple passes.
You could use a std::unordered_map<Value, bool>.
I want to keep a data structure for storing all the elements that I have seen till now.
A way to re-phrase that is to say "I want a data structure to store the set of all elements that I've seen till now". The clue is in the name. Without more information, std::unordered_set seems like a reasonable choice to represent a set.
That said, in practice it depends on details like what you're planning to do with this set. Array can be a good choice as well (yes, even for billions of elements), other set implementations may be better and maps can be useful in some use cases.

C++ efficient way to store and update sorted items

I have a operation that continuously generates random solutions (std::vector<float>). I evaluate the solutions against a mathematical function to see their usefulness (float). I would like to store the top 10 solutions all the time. What would be the most efficient way to do this in C++?
I need to store both the solutions(std::vector) and their usefulness (float). I am performing several hundred thousands of evaluations and hence I am in need of an efficient solution.
Edit:
I am aware of sorting methods. I am looking for methods other than sorting and storing the values. Looking for better data structures if any.
You evaluate the float score() function for current std::vector<T> solution, store them in a std::pair<vector<T>, float>.
You use a std::priority_queue< pair<vector<T>, float> > to store the 10 best solutions based on their score, and the score itself. std::priority_queue is a heap, so it allows you to extract its max value according to a compare function that you can set up to compare score_a < score_b.
Store the first 10 pairs, then for each new one compare it with the top of the heap, if score(new) > score(10th) then insert(new) into the priority_queue p, and p.pop_back() to get rid of the old 10th element.
You keep doing this inside a loop until you run out of vector<T> solutions.
Have a vector of pair, where pair has 1 element as solution and other element as usefulness. Then write custom comparator to compare elements in the vector.
Add element at last, then sort this vector and remove last element.
As #user4581301 mentioned in comments, for 10 elements, you dont need to sort. Just traverse vector everytime, or you can also perform ordered insert in vector.
Here are some links to help you:
https://www.geeksforgeeks.org/sorting-vector-of-pairs-in-c-set-1-sort-by-first-and-second/
Comparator for vector<pair<int,int>>

Remove elements from first set element which second set contains without iteration

I have two sets of pairs ( I cannot use c++11)
std::set<std::pair<int,int> > first;
std::set<std::pair<int,int> > second;
and I need to remove from first set all elements which are in second set(if first contain element from second to remove). I can do this by iterating through second set and if first contains same element erase from first element, but I wonder is there way to do this without iteration ?
If I understand correctly, basically you want to calculate the difference of first and second. There is an <algorithm> function for that.
std::set<std::pair<int, int>> result;
std::set_difference(first.begin(), first.end(), second.begin(), second.end(), inserter(result, result.end()));
Yes, you can.
If you want to remove, not just to detect, that is here another <algorithm> function: remove_copy_if():
http://www.cplusplus.com/reference/algorithm/remove_copy_if/
imho. It's not so difficult to understand how it works.
I wonder is there way to do this without iteration.
No. Internally, sets are balanced binary trees - there's no way to operate on them without iterating over the structure. (I assume you're interested in the efficiency of implementation, not the convenience in code, so I've deliberately ignored library routines that must iterates internally).
Sets are sorted though, so you could do an iterations over each, removing as you went (so # operations is the sum of set sizes) instead of an iteration and a lookup for each element (where number of operations is the number of elements you're iterating over times log base 2 of the number of elements in the other set). Only if one of your sets is much smaller than the other will the iterate/find approach will win out. If you look at the implementation of your library's set_difference function )mentioned in Amen's answer) - it should show you how to do the two iterations nicely.
If you want something more efficient, you need to think about how to achieve that earlier: for example, storing your pairs as flags in identically sized two-dimension matrix such that you can AND with the negation of the second set. Whether that's practical depends on the range of int values you're storing, whether the amount of memory needed is ok for your purposes....

Inserting and removing elements from an array while maintaining the array to be sorted

I'm wondering whether somebody can help me with this problem. I'm using C/C++ to program and I need to do the following:
I am given a sorted array P (biggest first) containing floats. It usually has a very big size.. sometimes holding correlation values from 10 megapixel images. I need to iterate through the array until it is empty. Within the loop there is additional processing taking place.
The gist of the problem is that at the start of the loop, I need to remove the elements with the maximum value from the array, check certain conditions and if they hold, then I need to reinsert the elements into the array but after decreasing their value. However, I want the array to be efficiently sorted after the reinsertion.
Can somebody point me towards a way of doing this? I have tried the naive approach of re-sorting everytime I insert, but that seems really wasteful.
Change the data structure. Repeatedly accessing the largest element, and then quickly inserting new values, in such a way that you can still efficiently repeatedly access the largest element, is a job for a heap, which may be fairly easily created from your array in C++.
BTW, please don't talk about "C/C++". There is no such language. You're instead making vague implications about the style in which you're writing things, most of which will strike experienced programmers as bad.
I would look into the http://www.cplusplus.com/reference/stl/priority_queue/, as it is designed to do just this.
You could use a binary search to determine where to insert the changed value after you removed it from the array. Note that inserting or removing at the front or somewhere in the middle is not very efficient either, as it requires moving all items with a higher index up or down, respectively.
ISTM that you should rather put your changed items into a new array and sort that once, after you finished iterating over the original array. If memory is a problem, and you really have to do things in place, change the values in place and only sort once.
I can't think of a better way to do this. Keeping the array sorted all the time seems rather inefficient.
Since the array is already sorted, you can use a binary search to find the location to insert the updated value. C++ provides std::lower_bound or std::upper_bound for this purpose, C provides bsearch. Just shift all the existing values up by one location in the array and store the new value at the newly cleared spot.
Here's some pseudocode that may work decently if you aren't decreasing the removed values by much:
For example, say you're processing the element with the maximum value in the array, and say the array is sorted in descending order (largest first).
Remove array[0].
Let newVal = array[0] - adjustment, where adjustment is the amount you're decreasing the value by.
Now loop through, adjusting only the values you need to:
Pseudocode:
i = 0
while (newVal < array[i]) {
array[i] = array[i+1];
i++;
}
array[i] = newVal;
swap(array[i], array[i+1]);
Again, if you're not decreasing the removed values by a large amount (relative to the values in the array), this could work fairly efficiently.
Of course, the generally better alternative is to use a more appropriate data structure, such as a heap.
Maybe using another temporary array could help.
This way you can first sort the "changed" elements alone.
And after that just do a regular merge O(n) for the two sub-arrays to the temp array, and copy everything back to the original array.

How to associate to a number another number without using array

Let's say we have read these values:
3
1241
124515
5322353
341
43262267234
1241
1241
3213131
And I have an array like this (with the elements above):
a[0]=1241
a[1]=124515
a[2]=43262267234
a[3]=3
...
The thing is that the elements' order in the array is not constant (I have to change it somewhere else in my program).
How can I know on which position does one element appear in the read document.
Note that I can not do:
vector <int> a[1000000000000];
a[number].push_back(all_positions);
Because a will be too large (there's a memory restriction). (let's say I have only 3000 elements, but they're values are from 0 to 2^32)
So, in the example above, I would want to know all the positions 1241 is appearing on without iterating again through all the read elements.
In other words, how can I associate to the number "1241" the positions "1,6,7" so I can simply access them in O(1) (where 1 actually is the number of positions the element appears)
If there's no O(1) I want to know what's the optimal one ...
I don't know if I've made myself clear. If not, just say it and I'll update my question :)
You need to use some sort of dynamic array, like a vector (std::vector) or other similar containers (std::list, maybe, it depends on your needs).
Such data structures are safer and easier to use than C-style array, since they take care of memory management.
If you also need to look for an element in O(1) you should consider using some structures that will associate both an index to an item and an item to an index. I don't think STL provides any, but boost should have something like that.
If O(log n) is a cost you can afford, also consider std::map
You can use what is commonly refered to as a multimap. That is, it stores Key and multiple values. This is an O(log) look up time.
If you're working with Visual Studios they provide their own hash_multimap, else may I suggest using Boost::unordered_map with a list as your value?
You don't need a sparse array of 1000000000000 elements; use an std::map to map positions to values.
If you want bi-directional lookup (that is, you sometimes want "what are the indexes for this value?" and sometimes "what value is at this index?") then you can use a boost::bimap.
Things get further complicated as you have values appearing more than once. You can sacrifice the bi-directional lookup and use a std::multimap.
You could use a map for that. Like:
std::map<int, std::vector<int>> MyMap;
So everytime you encounter a value while reading the file, you append it's position to the map. Say X is the value you read and Y is the position then you just do
MyMap[X].push_back( Y );
Instead of you array use
std::map<int, vector<int> > a;
You need an associative collection but you might want to associated with multiple values.
You can use std::multimap< int, int >
or
you can use std::map< int, std::set< int > >
I have found in practice the latter is easier for removing items if you just need to remove one element. It is unique on key-value combinations but not on key or value alone.
If you need higher performance then you may wish to use a hash_map instead of map. For the inner collection though you will not get much performance in using a hash, as you will have very few duplicates and it is better to std::set.
There are many implementations of hash_map, and it is in the new standard. If you don't have the new standard, go for boost.
It seems you need a std::map<int,int>. You can store the mapping such as 1241->0 124515->1 etc. Then perform a look up on this map to get the array index.
Besides the std::map solution offered by others here (O(log n)), there's the approach of a hash map (implemented as boost::unordered_map or std::unordered_map in C++0x, supported by modern compilers).
It would give you O(1) lookup on average, which often is faster than a tree-based std::map. Try for yourself.
You can use a std::multimap to store both a key (e.g. 1241) and multiple values (e.g. 1, 6 and 7).
An insert has logarithmic complexity, but you can speed it up if you give the insert method a hint where it can insert the item.
For O(1) lookup you could hash the number to find its entry (key) in a hash map (boost::unordered_map, dictionary, stdex::hash_map etc)
The value could be a vector of indices where the number occurs or a 3000 bit array (375 bytes) where the bit number for each respective index where the number (key) occurs is set.
boost::unordered_map<unsigned long, std::vector<unsigned long>> myMap;
for(unsigned long i = 0; i < sizeof(a)/sizeof(*a); ++i)
{
myMap[a[i]].push_back(i);
}
Instead of storing an array of integer, you could store an array of structure containing the integer value and all its positions in an array or vector.