Container for database-like searches - c++

I'm looking for some STL, boost, or similar container to use the same way indexes are used in databases to search for record using a query like this:
select * from table1 where field1 starting with 'X';
or
select * from table1 where field1 like 'X%';
I thought about using std::map, but I cannot because I need to search for fields that "start with" some text, and not those that are "equal to". Beside that, I need it to work on multiple fields (each "record" has 6 fields, for example), so I would need a separate std::map for each one.
I could create a sorted vector or list and use binary search (breaking the set in 2 in each step by reading the element in the middle and seeing if it's more or less than 'X'), but I wonder if there is some ready-made container I could use without reinventing the wheel?

Boost.Multi-Index allows you to manage with several index and it implements the lower_bound as for std::set/map. You will need to select the index corresponding to the field and then do as if it was a map or a set.
Next follows a generic function that could be used to get a couple of iterators, the fist to the first item starting with a given prefix, the second the first item starting with the next prefix, i.e. the end of the search
template <typename SortedAssociateveContainer>
std::pair<typename SortedAssociateveContainer::iterator,
typename SortedAssociateveContainer::iterator>
starts_with(
SortedAssociateveContainer const& coll,
typename SortedAssociateveContainer::key_type const& k)
{
return make_pair(coll.lower_bound(k),
coll.lower_bound(next_prefix(k));
}
where
next_prefix gets the next prefix using lexicographic order based on the SortedAssociateveContainer comparator (of course this function needs more arguments to be completely generic, see the question).
The result of starts_with can be used on any range algorithm (see Boost.Range)

std::map is fine, or std::set if there's no data other than the string. Pass your prefix string into lower_bound to get the first string which sorts at or after that point. Then iterate forward through the map until you hit the end or find an element which doesn't begin with your prefix.

Related

Insert a pair into a map and increase the count?

The codebase I'm working in uses the map::operator[] to insert and increment the count of items in that entry by one (this is a knowledge gap for me). Here's an example:
map<string, size_t> namesMap;
namesMap[firstName]++;
What I want to do is tack on an ID to the insert while retaining the increment behavior in the syntax above.
My new map would look like this:
map<string, pair<int, size_t>> namesMapWithID;
I'm struggling to see how to get the equivalent functionality with my new map. This is basically my goal (obviously wrong since "++" cannot be used this way):
namesMapWithID.insert(firstName, make_pair(employeeID, ++));
Is there a better approach that I'm missing?
You can do this via using the insert method along with the it/bool pair it returns, thereby delivering a single lookup (by name), setting the employee id if on the initial lookup, and then incrementing the counter respectively.
Something like this:
auto pr = namesMapWithID.insert(std::make_pair(firstName,
std::make_pair(employeeID, size_t())));
++pr.first->second.second;

Check if element exist or not

I have a multimap.
std::multimap<CString, CString> NameInsituteMap;
and I have to write function which return true if both name and institute matches otherwise false;
bool InsituteExist( const CString Name, const CString Insitute )
{
}
I can find the key and iterate all the value to compare if Institute exist or not.
I want to know if there is any direct way of doing that instead of looping through all the element and comparing.
I am open to use any other data structure than multimap if that makes things nicer.
Use equal_range from multimap.
Here you have a live example
You can find the sequence elements from the multimap for a given key efficiently. There is no better way to find a particular value from this sequence, than linear search.
std::map<CString, std::set<CString>> would be an alternative data structure, which is also efficient for finding whether a value exists in the set associated with the key. It has a bit different interface though. Instead of simply inserting a value to a key, you must "get" the value set of a key, insert into that set.
If the map aspect of your data structure isn't used otherwise, another simpler alternative is to use std::set<std::pair<CString, Cstring>>. This can be easily used to test if the key-value pair is in the set, but of course lacks other features that the multimap had.

boost::multi_index composite keys efficiency

Long time reader first time poster! I'm playing around with the boost::multi_index container stuff and have a rather in-depth question that hopefully a boost or C++ container expert might know (my knowledge in C++ containers is pretty basic). For reference, the boost documentation on composite keys can be found here: boost::multi_index composite keys.
When using a composite key, the documentation states that "Composite keys are sorted by lexicographical order, i.e. sorting is performed by the first key, then the second key if the first one is equal, etc". Does this mean that the structure is stored such that a lookup for a specific 2-part composite key will take O(n=1) time, i.e. is the container sorted such that there is a pointer directly to each item, or does the boost container retrieve a list that matches the first part of the composite key and then need to perform a search for items matching the second part of the key and thus is slower?
For example, if I was to maintain two containers manually using two different indices and wanted to find items that matched a specific 2-part query I would probably filter the first container for all items matching the 1st part of the query, and then filter the result for items that match the 2nd part of the query. So this manual method would effectively involve two searches. Does boost effectively do this or does it improve on efficiency somehow via the use of composite keys?
Hopefully I've explained myself here but please ask questions and I will try my best to clarify exactly what I mean!
Lookups involving composite keys do not go through any two-stage process as you describe. composite_key-induced orderings are normal orderings, the only special thing about it being its dependence on two or more element keys rather than one.
Maybe an example will clarify. Consider this use of composite_key:
struct element
{
int x,y,z;
};
typedef multi_index_container<
element,
indexed_by<
ordered_unique<
composite_key<
element,
member<element,int,&element::x>,
member<element,int,&element::y>,
member<element,int,&element::z>
>
>
>
> multi_t;
The resulting container is in a sense equivalent to this:
struct element_cmp
{
bool operator()(const element& v1, const element& v2)const
{
if(v1.x<v2.x)return true;
if(v2.x<v1.x)return false;
if(v1.y<v2.y)return true;
if(v2.y<v1.y)return false;
return v1.z<v2.z;
}
};
typedef std::set<element,element_cmp> set_t;
composite_key automatically generates equivalent code to that in element_cmp::operator(), and additionally allows for lookup on just the first n keys, but the underlying data structure does not change with respect to the case using std::set.

Fast search algorithm with std::vector<std::string>

for (std::vector<const std::string>::const_iterator it = serverList.begin(); it != serverList.end(); it++)
{
// found a match, store the location
if (index == *it) // index is a string
{
indexResult.push_back(std::distance(serverList.begin(), it)); // std::vector<unsigned int>
}
}
I Have written the above code to look through a vector of strings and return another vector with the location of any "hits".
Is there a way to do the same, but faster? (If I have 10,000 items in the container, it will take a while).
Please note that I have to check ALL of the items for matches and store its position in the container.
Bonus Kudos: Anyone know any way/links on how I can make the search so that it finds partial results (Example: search for "coolro" and store the location of variable "coolroomhere")
Use binary_search after sorting the vector
std::sort( serverList.begin() , serverList.end() )
std::lower_bound(serverList.begin() , serverList.end() , valuetoFind) to find first matching
Use std::equal_range if you want to find all matching elements
The lower_bound & equal_range search because it is binary is logarithmic compared to your search that is O(N)
Basically, you're asking if it's possible to check all elements for a
match, without checking all elements. If there is some sort of external
meta-information (e.g. the data is sorted), it might be possible (e.g.
using binary search). Otherwise, by its very nature, to check all
elements, you have to check all elements.
If you're going to do many such searches on the list and the list
doesn't vary, you might consider calculating a second table with a good
hash code of the entries; again depending on the type of data being
looked up, it could be more efficient to calculate the hash code of the
index, and compare hash codes first, only comparing the strings if the
hash codes were equal. Whether this is an improvement or not largely
depends on the size of the table and the type of data in it. You might
also, be able to leverage off knowledge about the data in the strings; if
they are all URL's, for example, mostly starting with "http://www.",
starting the comparison at the tenth character, and only coming back to
compare the first 10 if all of the rest are equal, could end up with a big
win.
With regards to finding substrings, you can use std::search for each
element:
for ( auto iter = serverList.begin();
iter != serverList.end();
++ iter ) {
if ( std::search( iter->begin(), iter->end(),
index.begin(), index.end() ) != iter->end() ) {
indexResult.push_back( iter - serverList.begin() );
}
}
Depending on the number of elements being searched and the lengths of
the strings involved, it might be more efficient to use something like
BM search, however, precompiling the search string to the necessary
tables before entering the loop.
If you make the container a std::map instead of a std::vector, the underlying data structure used will be one that is optimized for doing keyword searches like this.
If you instead use a std::multimap, the member function equal_range() will return a pair of iterators covering every match in the map. That sounds to me like what you want.
A smart commenter below points out that if you don't actually store any more infomation than the name (the search key), then you should probably instead use a std::multiset.

Having a pair<string, string> how to find if its part of some pair in map<string, string>?

We have a pair of strings for example such pair Accept-Language : RU , and we search thru map, for example of http request headers. All we ned to know if there is such pair in map or not - a bool value. How to do to a soft search meaning we do not need to find exact same pair but pair like Accept-Language : ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4 is also a valid pair for us and if such exists we can think we have found that our map contains our pair. How to make a function for performing such search in C++?
First of all, if you are using a map, you cannot have multiple entries with the same key. E.g. you can't have both Accept-Language : RU and Accept-Language : ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4 because they have the same key `Accept-Language'. Perhaps in your case you should use a vector of pairs, or a multimap.
Next, your question consists of 2 parts:
How to check, whether some element (such as string or pair)
matches a pattern.
Assuming you have
such a check, how to apply it to
each element in a container.
The solutions to each part:
You can implement a function that takes a string, or a pair (depends on the type of container and stored element that you choose), and checks whether it matches your criteria. You can find the functions such as string::find_first_of to be useful for that matter. The regex libraries can be even more helpful, though they are not part of the STL.
You can apply this function on every element of your container using find_if algorithm.