How to efficiently implement below c++ function in rust? The data structure must be tree based (BTree, RBTree, etc).
Given a sorted map m, a key target, and a value val.
Find the lower_bound entry (the first key >= target). return DEFAULT if no such entry.
If the value of the found entry <= val and it has previous entry, return value of previous entry.
If the value of the found entry > val and it has next entry, return value of the next entry.
Otherwise, return the found value.
template<class K, class V>
V find_neighbor(const std::map<K, V>& m, const K& target, const V& val) {
auto it = m.lower_bound(target);
if( it == m.end() ) return V{}; // DEFAULT value.
if( it->second <= val && it != m.begin() )
return (--it)->value; // return previous value
if( it->second > val && it != (--m.end()) )
return (++it)->value; // return next value
return it->second; // return target value
}
Thats what I've got.
Create trait FindNeighbor that adds the function find_neighbor to all BTreeMaps
I'm quite confused what the algorithm does, though, tbh. But it should (tm) behave identical to the C++ version.
If you use this in an actual project though, for the love of god, please write unit tests for it. 😄
use std::{borrow::Borrow, collections::BTreeMap};
trait FindNeighbor<K, V> {
type Output;
fn find_neighbor(&self, target: K, val: V) -> Self::Output;
}
impl<K, V, KI, VI> FindNeighbor<K, V> for BTreeMap<KI, VI>
where
K: Borrow<KI>,
V: Borrow<VI>,
KI: Ord,
VI: Default + PartialOrd + Clone,
{
type Output = VI;
fn find_neighbor(&self, target: K, val: V) -> VI {
let val: &VI = val.borrow();
let target: &KI = target.borrow();
let mut it = self.range(target..);
match it.next() {
None => VI::default(),
Some((_, it_value)) => {
if it_value <= val {
match self.range(..target).rev().next() {
Some((_, prev_val)) => prev_val.clone(),
None => it_value.clone(),
}
} else {
match it.next() {
Some((_, next_val)) => next_val.clone(),
None => it_value.clone(),
}
}
}
}
}
}
fn main() {
let map = BTreeMap::from([(1, 5), (2, 3), (3, 8)]);
println!("{:?}", map.find_neighbor(3, 10));
}
3
Note a couple of differences between C++ and Rust:
Note that there are trait annotations on the generic parameters. Generic functions work a little different than C++ templates. All the capabilities that get used inside of a generic method have to be annotated as trait capabilities. The advantage is that generics are then guaranteed to work with every type they take, no random compiler errors can occur any more. (C++ templates are more like duck-typing, while Rust generics are strongly typed)
We implement a trait that adds new functionality to an external struct. That is something that also doesn't exist in C++, and tbh I really like this mechanic in Rust.
How to get the index of each entry in a Map<K, V> in Dart?
Specifically, can the index of each entry be printed out, if a map function is run on the object as shown in the below example?
e.g. how could I print out:
MapEntry(Spring: 1) is index 0
MapEntry(Chair: 9) is index 1
MapEntry(Autumn: 3) is index 2
etc.
Map<String, int> exampleMap = {"Spring" : 1, "Chair" : 9, "Autumn" : 3};
void main() {
exampleMap.entries.map((e) { return print(e);}).toList(); ///print each index here
}
-Note: I can get the index with a List object (e.g. by using exampleList.indexOf(e)) but unsure how to do this when working with a Map object.
You can use forEach and a variable to track the current index:
Map<String, int> exampleMap = {"Spring": 1, "Chair": 9, "Autumn": 3};
void main() {
int i = 0;
exampleMap.forEach((key, value) {
print(i.toString());
print(key);
print(value);
i++;
});
}
Suppose we have a data structure that is a key-value map, where the key itself is again a key-value map. For example:
map<map<string,string>>, string>
Now, suppose that we want to query all top-level key/values in this map matching a certain subset of the key-values of the key. Example:
map = { { "k1" : "v1", "k2 : "v2" } : "value1",
{ "k1" : "v3", "k2 : "v4" } : "value2",
{ "k1" : "v1", "k2 : "v5" } : "value3"
}
And our query is "give me all key-values where key contains { "k1" : "v1" } and it would return the first and third value. Similarly, querying for { "k1" : "v3", "k2" : "v4" } would return all key-values that have both k1=v3 and k2=v4, yielding the second value. Obviously we could search through the full map on every query, but I'm looking for something more efficient than that.
I have looked around, but can't find an efficient, easy-to-use solution out there for C++. Boost multi_index does not seem to have this kind of flexibility in querying subsets of key-value pairs.
Some databases have ways to create indices that can answer exactly these kind of queries. For example, Postgres has GIN indices (generalized inverted indices) that allow you to ask
SELECT * FROM table WHERE some_json_column #> '{"k1":"v1","k2":"v2"}'
-- returns all rows that have both k1=v1 and k2=v2
However, I'm looking for a solution without databases just in C++. Is there any library or data structure out there that can accomplish something like this? In case there is none, some pointers on a custom implementation?
I would stay with the database index analogy. In that analogy, the indexed search does not use a generic k=v type search, but just a tuple with the values for the elements (generally columns) that constitute the index. The database then reverts to scans for the other k=v parameters that are not in the index.
In that analogy, you would have a fixed number of keys that could be represented as an array or strings (fixed size). The good news is that it is then trivial to set a global order on the keys, and thanks to the std::map::upper_bound method, it is also trivial to find an iterator immediately after a partial key.
So getting a full key is immediate: just extract it with find, at or operator []. And getting all elements for a partial key is still simple:
find an iterator starting above the partial key with upper_bound
iterate forward while the element matches the partial key
But this require that you change your initial type to std::map<std::array<string, N>, string>
You could build an API over this container using std::map<string, string> as input values, extract the actual full or partial key from that, and iterate as above, keeping only elements matching the k,v pairs not present in index.
You could use std::includes to check if key maps include another map of queried key-value pairs.
I am unsure how to avoid checking every key-map though. Maybe other answers have a better idea.
template <typename MapOfMapsIt, typename QueryMapIt>
std::vector<MapOfMapsIt> query_keymap_contains(
MapOfMapsIt mom_fst,
MapOfMapsIt mom_lst,
QueryMapIt q_fst,
QueryMapIt q_lst)
{
std::vector<MapOfMapsIt> out;
for(; mom_fst != mom_lst; ++mom_fst)
{
const auto key_map = mom_fst->first;
if(std::includes(key_map.begin(), key_map.end(), q_fst, q_lst))
out.push_back(mom_fst);
}
return out;
}
Usage:
typedef std::map<std::string, std::string> StrMap;
typedef std::map<StrMap, std::string> MapKeyMaps;
MapKeyMaps m = {{{{"k1", "v1"}, {"k2", "v2"}}, "value1"},
{{{"k1", "v3"}, {"k2", "v4"}}, "value2"},
{{{"k1", "v1"}, {"k2", "v5"}}, "value3"}};
StrMap q1 = {{"k1", "v1"}};
StrMap q2 = {{"k1", "v3"}, {"k2", "v4"}};
auto res1 = query_keymap_contains(m.begin(), m.end(), q1.begin(), q1.end());
auto res2 = query_keymap_contains(m.begin(), m.end(), q2.begin(), q2.end());
std::cout << "Query1: ";
for(auto i : res1) std::cout << i->second << " ";
std::cout << "\nQuery2: ";
for(auto i : res2) std::cout << i->second << " ";
Output:
Query1: value1 value3
Query2: value2
Live Example
I believe the efficiency of different methods will depend on actual data. However, I would consider making a "cache" of iterators to outer map elements for particular "kX","vY" pairs as follows:
using M = std::map<std::map<std::string, std::string>, std::string>;
M m = {
{ { { "k1", "v1" }, { "k2", "v2" } }, "value1" },
{ { { "k1", "v3" }, { "k2", "v4" } }, "value2" },
{ { { "k1", "v1" }, { "k2", "v5" } }, "value3" }
};
std::map<M::key_type::value_type, std::vector<M::iterator>> cache;
for (auto it = m.begin(); it != m.end(); ++it)
for (const auto& kv : it->first)
cache[kv].push_back(it);
Now, you basically need to take all searched "kX","vY" pairs and find the intersection of cached iterators for them:
std::vector<M::key_type::value_type> find_list = { { "k1", "v1" }, { "k2", "v5" } };
std::vector<M::iterator> found;
if (find_list.size() > 0) {
auto it = find_list.begin();
std::copy(cache[*it].begin(), cache[*it].end(), std::back_inserter(found));
while (++it != find_list.end()) {
const auto& temp = cache[*it];
found.erase(std::remove_if(found.begin(), found.end(),
[&temp](const auto& e){ return std::find(temp.begin(), temp.end(), e) == temp.end(); } ),
found.end());
}
}
The final output:
for (const auto& it : found)
std::cout << it->second << std::endl;
gives value3 in this case.
A live demo: https://wandbox.org/permlink/S9Zp8yofSvjfLokc.
Note that the complexity of the intersection step is quite large, since cached iterators are unsorted. If you use pointers instead, you can sort the vectors or store the pointers in a map instead, which would allow you to find intersections much faster, e.g., by using std::set_intersection.
You can do it with as single (partial) pass through each element with an ordered query, returning early as much as possible. Taking inspiration from std::set_difference, we want to know if query is a subset of data, which lets us select entries of the outer map.
// Is the sorted range [first1, last1) a subset of the sorted range [first2, last2)
template<class InputIt1, class InputIt2>
bool is_subset(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2)
{
while (first1 != last1) {
if (first2 == last2) return false; // Reached the end of data with query still remaing
if (*first1 < *first2) {
return false; // didn't find this query element
} else {
if (! (*first2 < *first1)) {
++first1; // found this query element
}
++first2;
}
}
return true; // reached the end of query
}
// find every element of "map-of-maps" [first2, last2) for which the sorted range [first1, last1) is a subset of it's key
template<class InputIt1, class InputIt2, class OutputIt>
OutputIt query_data(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2, OutputIt d_first)
{
auto item_matches = [=](auto & inner){ return is_subset(first1, last1, inner.first.begin(), inner.first.end()); };
return std::copy_if(first2, last2, d_first, item_matches);
}
std::map is implemented as a balanced binary tree which has O(nlgn) look-up. What you need instead, is std::unordered_map which is implemented as a hash-table, that is O(1) look-ups.
Now let me rephrase your wording, you want to:
And our query is "give me all key-values where key contains { "k1" : "v1" } and it would return the first and third value.
Which translates to:
If the key-value pair given is in the inner map, give me back its value.
Essentially what you need is a double look-up which std::unordered_map excel at.
Here is a code spinet that solves your problem with the standard library (no fancy code required)
#include <iostream>
#include <unordered_map>
#include <string>
int main() {
using elemType = std::pair<std::string, std::string>;
using innerMap = std::unordered_map<std::string, std::string>;
using myMap = std::unordered_map<std::string, innerMap>;
auto table = myMap{ { "value1", { {"k1", "v1"}, {"k2", "v2"} } },
{ "value2", { {"k1", "v3"}, {"k2", "v4"} } },
{ "value3", { {"k1", "v1"}, {"k2", "v5"} } } };
//First we set-up a predicate lambda
auto printIfKeyValueFound = [](const myMap& tab, const elemType& query) {
// O(n) for the first table and O(1) lookup for each, O(n) total
for(const auto& el : tab) {
auto it = el.second.find(query.first);
if(it != el.second.end()) {
if(it->second == query.second) {
std::cout << "Element found: " << el.first << "\n";
}
}
}
};
auto query = elemType{"k1", "v1"};
printIfKeyValueFound(table, query);
Output: Value3, Value1
For queries of arbitrary size you can:
//First we set-up a predicate lambda
auto printIfKeyValueFound = [](const myMap& tab, const std::vector<elemType>& query) {
// O(n) for the first table and O(n) for the query O(1) search
// O(n^2) total
for(const auto& el : tab) {
bool found = true;
for(const auto& queryEl : query) {
auto it = el.second.find(queryEl.first);
if(it != el.second.end() && it->second != queryEl.second) {
found = false;
break;
}
}
if(found)
std::cout << el.first << "\n";
}
};
auto query = std::vector<elemType>{ {"k1", "v1"}, {"k2", "v2"} };
output Value1
I want to return a range from a function that represents a view on a STL collection, something like this:
auto createRange() {
std::unordered_set<int> is = {1, 2, 3, 4, 5, 6};
return is | view::transform([](auto&& i) {
return i;
});
}
However, view::transform does not take ownership of is, so when I run this, there is undefined behavior, because is is freed when createRange exits.
int main(int argc, char* argv[]) {
auto rng = createRange();
ranges::for_each(rng, [](auto&& i) {
std::cout << std::to_string(i) << std::endl;
});
}
If I try std::move(is) as the input, I get a static assert indicating that I can't use rvalue references as inputs to a view. Is there any way to ensure that the view takes ownership of the collection?
Edit: Some Additional Info
I want to add some clarifying info. I have a stream of data, data that I have a view on that transforms the data into a struct, Foo, that looks something like this:
struct Foo {
std::string name;
std::unordered_set<int> values;
}
// Take the input stream and turn it into a range of Foos
auto foos = data | asFoo();
What I want to do is create a range of std::pair<std::string, int> by distributing the name throughout the values. My naive attempt looks something like this:
auto result = data | asFoo() | view::transform([](auto&& foo) {
const auto& name = foo.name;
const auto& values = foo.values;
return values | view::transform([name](auto&& value) {
return std::make_pair(name, value);
}
}) | view::join;
However, this results in the undefined behavior because values is freed. The only way that I have been able to get around this is to make values a std::shared_ptr and to capture it in the lambda passed to view::transform to preserve it's lifetime. That seems like an inelegant solution.
I think what I am looking for is a view that will take ownership of the source collection, but it does not look like range-v3 has that.
Alternatively, I could just create the distributed version using a good old fashioned for-loop, but that does not appear to work with view::join:
auto result = data | asFoo() | view::transform([](auto&& foo) {
const auto& name = foo.name;
const auto& values = foo.values;
std::vector<std::pair<std::string, std::string>> distributedValues;
for (const auto& value : values) {
distributedValues.emplace_back(name, value);
}
return distributedValues;
}) | view::join;
Even if this did work with view::join, I also think that the mixed metaphor of ranges and loops is also inelegant.
Views do not own the data they present. If you need to ensure the persistence of the data, then the data itself needs to be preserved.
auto createRange() {
//We're using a pointer to ensure that the contents don't get moved around, which might invalidate the view
std::unique_ptr<std::unordered_set<int>> is_ptr = std::make_unique<std::unordered_set<int>>({1,2,3,4,5,6});
auto & is = *is_ptr;
auto view = is | view::transform([](auto&& i) {return i;});
return std::make_pair(view, std::move(is_ptr));
}
int main() {
auto[rng, data_ptr] = createRange();
ranges::for_each(rng, [](auto&& i) {
std::cout << std::to_string(i) << std::endl;
});
}
An alternate method is to make sure the function is provided the data set from which the view will be created:
auto createRange(std::unordered_set<int> & is) {
return is | view::transform([](auto&& i) {return i;});
}
int main() {
std::unordered_set<int> is = {1,2,3,4,5,6};
auto rng = createRange(is);
ranges::for_each(rng, [](auto&& i) {
std::cout << std::to_string(i) << std::endl;
});
}
Either solution should broadly represent what your solution for your project will need to do.
The following code groups values for any container with a generic grouping lambda:
template<class Iterator, class GroupingFunc,
class T = remove_reference_t<decltype(*declval<Iterator>())>,
class GroupingType = decltype(declval<GroupingFunc>()(declval<T&>()))>
auto groupValues(Iterator begin, Iterator end, GroupingFunc groupingFunc) {
map<GroupingType, list<T>> groups;
for_each(begin, end,
[&groups, groupingFunc](const auto& val){
groups[groupingFunc(val)].push_back(val);
} );
return groups;
}
With the following usage:
int main() {
list<string> strs = {"hello", "world", "Hello", "World"};
auto groupOfStrings =
groupValues(strs.begin(), strs.end(),
[](auto& val) {
return (char)toupper(val.at(0));
});
print(groupOfStrings); // assume a print method
list<int> numbers = {1, 5, 10, 24, 13};
auto groupOfNumbers =
groupValues(numbers.begin(), numbers.end(),
[](int val) {
int decile = int(val / 10) * 10;
return to_string(decile) + '-' + to_string(decile + 9);
});
print(groupOfNumbers); // assume a print method
}
I am a bit reluctant regarding the (over?)-use of declval and decltype in groupValues function.
Do you see a better way for writing it?
(Question is mainly for better style and clarity unless of course you see any other issue).
Code: http://coliru.stacked-crooked.com/a/f65d4939b402a750
I would probably move the last two template parameters inside the function, and use std::result_of to give a slightly more tidy function:
template <typename T>
using deref_iter_t = std::remove_reference_t<decltype(*std::declval<T>())>;
template<class Iterator, class GroupingFunc>
auto groupValues(Iterator begin, Iterator end, GroupingFunc groupingFunc) {
using T = deref_iter_t<Iterator>;
using GroupingType = std::result_of_t<GroupingFunc(T&)>;
std::map<GroupingType, std::list<T>> groups;
std::for_each(begin, end, [&groups, groupingFunc](const auto& val){
groups[groupingFunc(val)].push_back(val);
});
return groups;
}
live demo