Database-like search algorithm in C++

Database-like search algorithm in C++ - c++

I have a collection of objects. Each object is described by ~4 parameters (let's say two integers and two strings). How can I implement this collection in C++ to be able to quickly find subsets of these objects by specifying search-criterias, i.e. "find all object with first parameter equal to 1", or "search all objects with second parameter equal to 'foo' " (lookup is always performed using one-parameter query: parameter=value). Should I have 4 std::maps, so that each parameter based lookup is performed in O(logn) ? What if I add another parameter and another?
Are there any existing solutions for this problem?

You should try Boost Multi Index, which is meant for just this kind of thing.

One array for the data, four hash tables (std::tr1::unordered_map) for the indexes.

Related

Elixir: Select mupliple elements from a list based on index

Assuming I have a list, is there a built-in operator or function to select elements based on a list of indices?
For example, an operator something like this ["a", "b", "z"] = alphabet[0, 1, 25]
An naive implementation of this could be:
def select(list, indices) do
Enum.map(indices, &(Enum.at(list, &1)))
end
If it doesn't exist, it this a deliberate omission to avoid lists being treated like arrays?
An example of what I'm attempting that made me want this, in case I'm asking the wrong question: Given a list, I want to select the first, middle, and last elements, then calculate the median of the three. I was doing length(list) to calculate the length, then I wanted to use this operator/function to select the three elements I'm interested in.

As far as I know, the built in operator does not exist. And each time I have to fetch several elements in a list, I use the same implementation as yours. It is quite short and simple to recreate and I suspect it is the reason why there are no off-the shelf solution in elixir.
Another reason I can think of, is as you pointed out, the fact that lists aren't arrays: when you want to access one element, you have to access all the elements before it, therefore accessing elements by a list of index is not a relevant function, because list are not optimized to be used that way.
Still I often access a list of element with a list of index, meaning that I might not be using elixir the right way.

Efficiently searching a Linked List of Objects for a String

For certain reasons, I have a linked list of objects, with the Object containing a string.
I might be required to search for a particular string, and in doing so, retrieve the object, based on that string.
The starting header of the list is the only input I have for the list.
Though the number of objects I have, is capped at 3000, and that's not so much, I still wondered if there was an efficient way to do this, instead of searching the objects one by one for a matching string.
The Objects in the list are not sorted in any way, and I cannot expect them to be sorted, and the entry point to the linked list is the only input I have.
So, could anyone tell me if there's an efficient way ( search algorithm perhaps ) to achieve this?
Separately for this kind of search, if required, what would be the sort of data structure recommended, assuming that this search be the most data intensive function of the object?
Thanks..

Use a std::map<std::string, YourObjectType>. You can still iterate all objects. But they are sorted by the string now.
If you might have multiple objects with the same string, use a multimap instead.

If you can't switch to any different structure/container, then there is no way to do this better than linear to size of list.

Having 3000 you would like to use a unordered map instead of a linked list, which will give you average O(1) lookup, insertion, and deletion time.

Hierarchical filtered lookup in C++

I have been pondering a data structure problem for a while, but can't seem to come up with a good solution. I can not shake off the feeling that the solution is simple and I'm just not seeing it, however, so hopefully you guys can help!
Here is the problem: I have a large collection of objects in memory. Each of them has a number of data fields. Some of the data fields, such as an ID, are unique for each objects, but others, such as a name, can appear in multiple objects.
class Object {
size_t id;
std::string name;
Histogram histogram;
Type type;
...
};
I need to organize these objects in a way that will allow me to quickly (even if the number of objects is relatively large, i.e. millions) filter the collection given a specification of an arbitrary number of object members while all members that are left unspecified count as wildcards. For example, if I specify a given name, I want to retrieve all the objects whose name member equals the given name. However, if I then add a histogram to the query, I would like the query to return only the objects that match in both the name and the histogram fields, and so on. So, for example, I'd like a function
std::set<Object*> retrieve(size_t, std::string, Histogram, Type)
that can both do
retrieve(42, WILDCARD, WILDCARD, WILDCARD)
as well as
retrieve(42, WILDCARD, WILDCARD, Type_foo)
where the second call would return fewer or equally as many objects as the first one. Which data structure allows queries like this and can both be constructed and queried in reasonable time for object counts in the millions?
Thanks for the help!

First you could use Boost Multi-index to implement efficent lookup over differnt members of your Object. This could help to limit the number of elements to consider. As a second step you can simply use a lambda expression to implement a predicate for std::find_if to get first element or use std::copy_if to copy all elements to an target sequence. If you decide to use boost you can use Boost Range with filtering.

c++ last element of a structure field

I get a structure, and I don't know the size of it (every time it's different). I would like to set the last place in one of the fields of this structure to a certain value. In pseudocode, I mean something like this:
structureA.fieldB[end] = cert_value;
I'd do it in matlab however I cannot somehow find the proper syntax in c++, can you help me?

In Matlab, a structure data type holds key-value pairs where the "value" may be of different types. In C++, there are some key-value containers available (associative containers like set, map, multimap), but they usually store elements of a single type. What you need if I understood it right is something like
"one" : 1
"two" : [1,2,5]
"three" : "name"
Which means that your structure resembles a Python dictionary.
In C++, the only way I have heard of using containers with truly different types is by using boost::any, which is accepted as the answer to this question.
If you pack a container with elements of different types, then you can use the end() member function of a container to get the last element.

You need sizeof, this gives you the size of the array in bytes. Since you want the the index of the last element, you have to divide this number by the number of bytes for one element. You end up with:
int index_end = sizeof(structureA.fieldB) / sizeof(structureA.fieldB[0]);
structureA.fieldB[index_end] = new_value;

Given 200 strings, what is a good way to key a LUT of relationship values

I've got 200 strings. Each string has a relationship (measured by a float between 0 and 1) with every other string. This relationship is two-way; that is, relationship A/B == relationship B/A. This yields n(n-1)/2 relationships, or 19,800.
What I want to do is store these relationships in a lookup table so that given any two words I can quickly find the relationship value.
I'm using c++ so I'd probably use a std::map to store the LUT. The question is, what's the best key to use for this purpose.
The key needs to be unique and needs to be able to be calculated quickly from both words.
My approach is going to be to create a unique identifier for each word pair. For example given the words "apple" and "orange" then I combine them together as "appleorange" (alphabetical order, smallest first) and use that as the key value.
Is this a good solution or can someone suggest something more cleverer? :)

Basically you are describing a function of two parameters with the added property that order of parameters is not significant.
Your approach will work if you do not have ambiguity between words when changing order (I would suggest putting a coma or like between the two words to remove possible ambiguities). Any 2D array would also work.
I would probably convert each keyword to some unique identifier (using a simple map) before trying to find the relationship value, but it does not change much from what you are proposing.

If boost/tr1 is acceptable, I would go for an unordered_map with the pair of strings as key. The main question would then be: what with the order of the strings? This could be handled by the hash-function, which starts with the lexical first string.
Remark: this is just a suggestion after reading the design-issue, not a study.

How "quickly" is quickly? Given you don't care about the order of the two words, you could try a map like this:
std::map<std::set<std::string>, double> lut;
Here the key is a set of the two words, so if you insert "apple" and "orange", then the order is the same as "orange" "apple", and given set supports the less than operator, it can function as a key in a map. NOTE: I intentionally did not use a pair for a key, given the order matters there...
I'd start with something fairly basic like this, profile and see how fast/slow the lookups etc. are before seeing if you need to do anything smarter...

If you create a sorted array with the 200 strings, then you can binary search it to find the matching indices of the two strings, then use those two indices in a 2D array to find the relationship value.

If your 200 strings are in an array, your 20,100 similarity values can be in a one dimensional array too. It's all down to how you index into that array. Say x and y are the indexes of the strings you want the similarity for. Swap x and y if necessary so that y>=x, then look at entry i= x + y(y+1)/2 in the large array.
(x,y) of (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),(0,3),(1,3)... will take you to entry 0,1,2,3,4,5,6,7...
So this uses space optimally and it gives faster look up than a map would. I'm assuming efficiency is at least mildly important to you since you are using C++!
[if you're not interested in self similarity values where y=x, then use i = x + y(y-1)/2 instead].

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Database-like search algorithm in C++ - c++

You should try Boost Multi Index, which is meant for just this kind of thing.

One array for the data, four hash tables (std::tr1::unordered_map) for the indexes.

Related

Elixir: Select mupliple elements from a list based on index

Efficiently searching a Linked List of Objects for a String

Hierarchical filtered lookup in C++

c++ last element of a structure field

Given 200 strings, what is a good way to key a LUT of relationship values

Categories

Resources