populate a list once - list

Using datomic and clojure I query a database and get a list of nine elements.
I would like to draw these nine elements on the page, and at this point they are guaranteed to be distinct elements.
However, every time I call the function, it redoes the query, returns a new list, and then gets an element from the new list. This is very inefficient and also introduces the possibility of duplicates.
I would like to memoize this list and have it be nth-indexable. Suggestions and ideas welcome.

Instead of calling "the function", you should only call it if you have not called it already. If you call it, make sure to store the result. If you don't call it, look up the result.
memoize (http://crossclj.info/fun/clojure.core/memoize.html) might help you to achieve that. Depending on your cache requirements, you might want to study its implementation and implement something more suitable.
You might want to refer to https://github.com/clojure/core.memoize for more sophisticated memoization needs on the serverside.
Any list is nth-indexable with O(n). For O(log 32 n) performance, create a vector from it using vec.

Related

How to search the value from a std::map when I use cuda?

I have something stored in std::map, which maps string to vector. Its keys and values looks like
key value
"a"-----[1,2,3]
"b"-----[8,100]
"cde"----[7,10]
For each thread, it needs to process one query. The query looks like
["a", "b"]
or
["cde", "a"]
So I need to get the value from the map and then do some other jobs like combine them. So as for the first query, the result will be
[1,2,3,8,100]
The problem is, how can threads access the map and find the value by a key?
At first, I tried to store it in global memory. However, It looks like it can only pass arrays from host to device.
Then I tried to use thrust, but I can only use it to store vector.
Is there any other way I can use? Or maybe I ignored some methods in thrust? Thanks!
**Ps: I do not need to modify the map, I just need to read data from it.
I believe it's unlikely you will benefit from doing any of this on the GPU, unless you have a huge number of queries which are all available to you at once, or at least in batches.
If you do not have that many queries, then just transferring the data (regardless of its exact format/structure) will likely be a waste.
If you do have that many queries, the benefit is still entirely unclear, and depends on a lot of parameters. The fact that you've been trying to use std::map for anything suggests (see below for the reason) that you haven't been seriously concerned with performance so far. If that's indeed the case, just don't make your life difficult by using a GPU.
So what's wrong what std::map? Nothing, but it's extremely slow even on the CPU, and this is even worse on the GPU.

chained hash table keys with universal hasing,does it need a rehash?

I am implementing a chained hash table using a vector < lists >. I resized my vector to a prime number, let's say 5. To choose the key I am using the universal hasing.
My question is, do I need to rehash my vector? I mean this code will generate always a key in a range between 0 and 5 because it depends from the size of my hashtable, causing collisions of course but the new strings will be all added in the lists of every position in the vector...so it seems I don't need to resize/rehash the whole thing. What do you think? Is this a mistake?
Yes, you do. Otherwise objects will be in the wrong hash bucket and when you search for them, you won't find them. The whole point of hashing is to make locating an object faster -- that won't work if objects aren't where they're supposed to be.
By the way, you probably shouldn't be doing this. There are people who have spent years developing efficient hashing algorithms. Trying to roll your own will result in poor performance. Start with the article on linear hashing in Wikipedia.
do I need to rehash my vector?
Your container could continue to function without rehashing, but searching, insertions and erasing will perform more and more like a plain list instead of a hash table: for example, if you've inserted 10,000 elements you can expect each list in your vector to have roundly 2000 elements, and you may have to search all 2000 to see if a value you're considering inserting is a duplicate, or to find a value to erase, or simply return an iterator to. Sure, 2,000 is better than 10,000, but it's a long way from the O(1) performance expected of a quality hash table implementation. Your non-resizing implementation is still "O(N)".
Is this a mistake?
Yes, a fundamental one.

Efficiently searching a Linked List of Objects for a String

For certain reasons, I have a linked list of objects, with the Object containing a string.
I might be required to search for a particular string, and in doing so, retrieve the object, based on that string.
The starting header of the list is the only input I have for the list.
Though the number of objects I have, is capped at 3000, and that's not so much, I still wondered if there was an efficient way to do this, instead of searching the objects one by one for a matching string.
The Objects in the list are not sorted in any way, and I cannot expect them to be sorted, and the entry point to the linked list is the only input I have.
So, could anyone tell me if there's an efficient way ( search algorithm perhaps ) to achieve this?
Separately for this kind of search, if required, what would be the sort of data structure recommended, assuming that this search be the most data intensive function of the object?
Thanks..
Use a std::map<std::string, YourObjectType>. You can still iterate all objects. But they are sorted by the string now.
If you might have multiple objects with the same string, use a multimap instead.
If you can't switch to any different structure/container, then there is no way to do this better than linear to size of list.
Having 3000 you would like to use a unordered map instead of a linked list, which will give you average O(1) lookup, insertion, and deletion time.

What's the best way to search from several map<key,value>?

I have created a vector which contains several map<>.
vector<map<key,value>*> v;
v.push_back(&map1);
// ...
v.push_back(&map2);
// ...
v.push_back(&map3);
At any point of time, if a value has to be retrieved, I iterate through the vector and find the key in every map element (i.e. v[0], v[1] etc.) until it's found. Is this the best way ? I am open for any suggestion. This is just an idea I have given, I am yet to implement this way (please show if any mistake).
Edit: It's not important, in which map the element is found. In multiple modules different maps are prepared. And they are added one by one as the code progresses. Whenever any key is searched, the result should be searched in all maps combined till that time.
Without more information on the purpose and use, it might be a little difficult to answer. For example, is it necessary to have multiple map objects? If not, then you could store all of the items in a single map and eliminate the vector altogether. This would be more efficient to do the lookups. If there are duplicate entries in the maps, then the key for each value could include the differentiating information that currently defines into which map the values are put.
If you need to know which submap the key was found in, try:
unordered_set<key, pair<mapid, value>>
This has much better complexity for searching.
If the keys do not overlap, i.e., are unique througout all maps, then I'd advice a set or unordered_set with a custom comparision functor, as this will help with the lookup. Or even extend the first map with the new maps, if profiling shows that is fast enough / faster.
If the keys are not unique, go with a multiset or unordered_multiset, again with a custom comparision functor.
You could also sort your vector manually and search it with a binary_search. In any case, I advice using a tree to store all maps.
It depends on how your maps are "independently created", but if it's an option, I'd make just one global map (or multimap) object and pass that to all your creators. If you have lots of small maps all over the place, you can just call insert on the global one to merge your maps into it.
That way you have only a single object in which to perform lookup, which is reasonably efficient (O(log n) for multimap, expected O(1) for unordered_multimap).
This also saves you from having to pass raw pointers to containers around and having to clean up!

How can I find out the diff of two vectors of hashes in clojure?

I have a vector which contains a list of Hash maps in Clojure and I have an add-watch on this vector to see any changes made. Is there an easy way to do a diff on the changes made to the hash map, so that maybe I could get a list of just the changed entries in the hash?
Note: This follows on from some earlier posts I have had where I have tried to persist changes to a database for a data structure stored in a ref. I have realised that the easiest way to save state is simple to watch the ref for changes and then store those changes. My ideal solution would be if the add-watch was passed a changelist as well :)
You probably need to define "diff" a little more precisely. For example, does an insertion in the middle of the vector count as a single change or as a change of that element and all subsequent ones? Also are your vectors guaranteed to be the same length?
Having said that, the simple approach would be something like:
First check the length of the two vectors. If one is longer, then consider the extra elements as changes
Then compare all the other elements with the corresponding element in the other vector using not= (this workes with hashes and will be very fast in the common case that the elements haven't changed). Something to get you started: (map not= vector-1 vector-2)
You can then use the answer from stackoverflow.com/questions/3387155/difference-between-two-maps that pmf mentioned if you want to find out exactly how the two maps differ.