I have an application that need to store a sequence of voltage data, each entry is something like a pair {time, voltage}
the time is not necessarily continuous, if the voltage doesn't move, I will not have any reading.
The problem is that i also need to have a function that lookup timestamp, like, getVoltageOfTimestamp(float2second(922.325))
My solution is to have a deque that stores the paires, then for every 30 seconds, I do a sampling and store the index into a map
std::map,
so inside getVoltageOfTimestamp(float2second(922.325)), I simply find the nearest interval_of_30_seconds to the desired time, and then move my pointer of deque to that corresponding_index_of_deque, iterate from there and find the correct voltage.
I am not sure whether there exist a more 'computer scientist' solution here, can anyone give me a clue?
You could use a binary search on your std::deque because the timestamps are in ascending order.
If you want to optimize for speed, you could also use a std::map<Timestamp, Voltage>. For finding an element, you can use upper_bound on the map and return the element before the one found by upper_bound. This approach uses more memory (because std::map<Timestamp, Voltage> has some overhead and it also allocates each entry separately).
Rather then use a separate map, you can do a binary search directly on the deque to find the closet timestamp. Given the complexity requirements of a std::map, doing a binary search will be just as efficient as a map lookup (both are O(log N)) and won't require the extra overhead.
Do you mind using c++ ox conepts ? If not deque<tuple<Time, Voltage>> will do the job.
One way you can improve over binary search is to identify the samples of your data. Assuming your samples are every 30 milliseconds, then in vector/list store the readings as you get them. In the other array, insert the index of the array every 30 seconds. Now given a timestamp, just go to the first array and find the index of the element in the list, now just go there and check few elements preceding/succeeding it.
Hope this helps.
Related
Given an input stream of numbers ranging from 1 to 10^5 (non-repeating) we need to be able to tell at each point how many numbers smaller than this have been previously encountered.
I tried to use the set in C++ to maintain the elements already encountered and then taking upper_bound on the set for the current number. But upper_bound gives me the iterator of the element and then again I have to iterate through the set or use std::distance which is again linear in time.
Can I maintain some other data structure or follow some other algorithm in order to achieve this task more efficiently?
EDIT : Found an older question related to fenwick trees that is helpful here. Btw I have solved this problem now using segment trees taking hints from #doynax comment.
How to use Binary Indexed tree to count the number of elements that is smaller than the value at index?
Regardless of the container you are using, it is very good idea to enter them as sorted set so at any point we can just get the element index or iterator to know how many elements are before it.
You need to implement your own binary search tree algorithm. Each node should store two counters with total number of its child nodes.
Insertion to binary tree takes O(log n). During the insertion counters of all parents of that new element should be incremented O(log n).
Number of elements that are smaller than the new element can be derived from stored counters O(log n).
So, total running time O(n log n).
Keep your table sorted at each step. Use binary search. At each point, when you are searching for the number that was just given to you by the input stream, binary search is going to find either the next greatest number, or the next smallest one. Using the comparison, you can find the current input's index, and its index will be the numbers that are less than the current one. This algorithm takes O(n^2) time.
What if you used insertion sort to store each number into a linked list? Then you can count the number of elements less than the new one when finding where to put it in the list.
It depends on whether you want to use std or not. In certain situations, some parts of std are inefficient. (For example, std::vector can be considered inefficient in some cases due to the amount of dynamic allocation that occurs.) It's a case-by-case type of thing.
One possible solution here might be to use a skip list (relative of linked lists), as it is easier and more efficient to insert an element into a skip list than into an array.
You have to use the skip list approach, so you can use a binary search to insert each new element. (One cannot use binary search on a normal linked list.) If you're tracking the length with an accumulator, returning the number of larger elements would be as simple as length-index.
One more possible bonus to using this approach is that std::set.insert() is log(n) efficient already without a hint, so efficiency is already in question.
I am implementing a chained hash table using a vector < lists >. I resized my vector to a prime number, let's say 5. To choose the key I am using the universal hasing.
My question is, do I need to rehash my vector? I mean this code will generate always a key in a range between 0 and 5 because it depends from the size of my hashtable, causing collisions of course but the new strings will be all added in the lists of every position in the vector...so it seems I don't need to resize/rehash the whole thing. What do you think? Is this a mistake?
Yes, you do. Otherwise objects will be in the wrong hash bucket and when you search for them, you won't find them. The whole point of hashing is to make locating an object faster -- that won't work if objects aren't where they're supposed to be.
By the way, you probably shouldn't be doing this. There are people who have spent years developing efficient hashing algorithms. Trying to roll your own will result in poor performance. Start with the article on linear hashing in Wikipedia.
do I need to rehash my vector?
Your container could continue to function without rehashing, but searching, insertions and erasing will perform more and more like a plain list instead of a hash table: for example, if you've inserted 10,000 elements you can expect each list in your vector to have roundly 2000 elements, and you may have to search all 2000 to see if a value you're considering inserting is a duplicate, or to find a value to erase, or simply return an iterator to. Sure, 2,000 is better than 10,000, but it's a long way from the O(1) performance expected of a quality hash table implementation. Your non-resizing implementation is still "O(N)".
Is this a mistake?
Yes, a fundamental one.
I am writing a program that will do a basic compression using a lookup table. To create the table, I will read in a text file (size 2MB) and then find the 255 most common words and store them into another text file. I am trying to use a vector now, but the runtime is slow at about a minute of runtime to insert into the vector, sort it, and then output the top 255 elements to another text file. The insertion appears to be the problematic since I am having to check for whether or not it already exists inside of the vector and then increment a counter if it does exist, or add the element to the end of the vector if it doesn't. I need to find an efficient way of inserting elements into a data structure only when they are not already inside of the data structure (No Duplicates).
std::unordered_map is likely to be the best for your purpose, no guarantees. You can "add a key if and only if not already present" just by using operator[].
You'll make one pass over the 2MB splitting into words and counting the frequencies (one lookup in the structure per word). Then use std::partial_sort_copy (the version that takes a comparator) to get the top 255 by frequency count from the unordered_map. You should partial_sort_copy into a vector or array and then use that to write the file.
For 2 MB of data, anything over a few seconds is certainly slower than it "should" be, and a few seconds is still slower than it could be. So you're right to be concerned about your vector, but you should also profile your code to make sure that it really is the vector costing you the time, not some other issue.
Try using STL map or set it is much faster than vector: see here
I want to create an animation system in c++ in wich I store keyframes, that have a time and a value. Those values shall be interpolated during playback, so i need them sortet by their time variable. Because when interpolating, i allways want to interpolate only between last and next keyframe (how it's usually done).
How would I store the keyframes, so I can easily (and fast) access the keyframe before and after a specific time?
At first std::map came to my mind, but there I have problems with the correct order of the keyframes... Any ideas how to do this better?
You can use std::vector and keep the correct order of the keyframes.
Assuming that the keyframes are sorted by time in the vector you can then extract the
relevant keyframe with std::lower_bound or std::binary_search in logarithmic time.
std::map internally keeps the elements sorted by the key
following a strict weak ordering criterion. So, if you use time as the key, you
will keep the correct order of the keyframes.
Personally, I would use std::vector.
I am writing a program for numerical simulation by using std::map to store some key-value pairs. The map is used as storing the states evoluted during the simulation. The type of the key is a integer and the value of corresponds to the key tells how many copies are there for the same keys, i.e. std::map. For each step of the simulation, I need to calculate how many values are there for the same key, so I will check that by the following code
if (map[key]>0) {do something here with the number of copies}
However, I soon find that this code doesn't work because even there is no such key in the map, whenever you call the map[key], it will generate a placeholder for that key and set the value as zero; therefore, I always overcount the total number of keys by std::map.size(). I later change the code as follow to search the key instead
if (map.find(key)!=map.end()) {...}
So is it the only and fastest way to check if a key exists or not for a map? I am going to run the simulation for hundreds millions times and it will call above code very often to check the key. Will it be too slow to use map.find() instead? Thanks.
The find member function is probably the fastest way to find whether a key is already in the map. That said, if you don't need to iterate over items in the map in order, you might get better performance with an std::unordered_map instead.
In a std::map or hashtable (std::unordered_map), the find function is very fast, as fast as using the [] subscripting operator. In fact, it's faster when the element is not found, because it doesn't have to insert one.
I don't think there is much difference in speed for various ways to check for existence of key. On the other hand: if your keys are integers and range is known, you might just use the array.
BTW:
I got interested about the speed of simple array, vector, map and unordered map. I have written simple program, that does 100000000 container[n]++, where n is a random number in range of 0 to 10000. The results:
array: 1.27s
vector: 1.36s
unordered map: 2.6s
map: 11.6s
The overhead of loop + index calculation in this simple case is ~0.8s.
So it all depends on how much time is spent elsewhere. If it's considerably more (per 100000000 iterations) then it does not matter much what you use. But if it's not, it can be quite a difference.
you can use hash_map, it is the fastest data structures for your key-value type;
also you can use map,but it is slower than hash_map