Database like C++ Data Structure - c++

I looked over questions like this one, but it seems most answers suggest that one use something like sqlite with an in memory database. Perhaps that is the only real solution but I'll ask regardless.
I have a number of records that look like this :
struct record
{
int id;
int type;
// bunch of other data
}
The problem involves storing a number of these in a data structure that would allow for efficient runtime queries like
GetAllForId(int)
GetAllOfType(int)
GetAllOfTypeAndId(int,int)
There can be multiple records of 'type' for a given 'id'
There can be multiple records of 'type' for different 'id'
I also want to be able to easily modify the values in the results of any
GetAllOfTypeAndId(int,int)
And ofcourse make insertions and deletions with a (preferably) low cost. Although insertions are infrequent so I can eat a bit of cost here.
For reference, I have tried the following solutions :
multimap<type,record>
Then just iterate on all records to find the one of the relevant type. Feels really cumbersome especially when doing GetAllOfTypeAndId queries, is good for GetAllOfId Queries
map<id,map<type,record>>
Allows for good GetAllOfTypeAndId queries but fails at providing decent access to GetAllofType
Unfortunately because of the nature of this project, I may not be able to use a relational database system, even if its in memory.

It sounds like you want a MultiIndex. Joaquin M Lopez has provided an excellent data structure which allows indexes across multiple fields, member functions, or other properties of a struct/class.
In your case, we could do something like:
struct record
{
int id;
int type;
// bunch of other data
};
struct id_tag {};
struct type_tag {};
using record_container = boost::multi_index_container<
record,
boost::indexed_by<
boost::hashed_unique<
boost::multi_index::tag<id_tag>,
boost::multi_index::member<record, int, &record:id>
>,
boost::hashed_non_unique<
boost::multi_index::tag<type_tag>,
boost::multi_index::member<record, int, &record:type>
>
>
>;
We can now access the data from a unique, hashed index of the ID (a regular hashmap, like unordered_map), or a non-unique, hashed index of the type (like unordered_multimap) using views of the data.
record_container data;
// .... Fill with data
// Find by ID
auto& id_view = data.get<id_tag>();
auto id_it = id_view.find(15);
if (id_it == id_view.end()) {
// not found
} else {
// found
}
You may also create composite keys and do complex logic fairly easy with such a container:
using composite = boost::multi_index::composite_key<
record,
boost::multi_index::member<record, int, &record::id>,
boost::multi_index::member<record, int, &record::type>
>;
Edit
If you cannot use Boost, I have a fork of MultiIndex which uses C++11 features and does not require Boost. It solely depends on a small subset of Brigand, a C++11, template metaprogramming, header-only library.

You could use a class and container and its iterators.
class DB
{
private:
typedef list<Record>::iterator iterator;
list<Record> records;
iterator itForId;
public:
// return iterator of first record with id == parameter
iterator GetFirstForId(int id);
iterator GetNextForId(int id);
iterator end() {return records.end();}
};
Deleting records can mess with the iterators so it might be better to mark records for deletion and later batch delete when not iterating through a container.
class DataRecord : public Record
{
bool markedForDeletion;
};

Related

Fastest way to find two unrelated objects in many-to-many relationship and implementation in C++

The basic setting is that, suppose we have two simple classes:
class User {
private:
int id;
public:
User(int id){ this->id = id };
};
class Channel {
private:
std::string name;
public:
Channel(std::string name) { this->name = name };
};
These two classes are supposed to be in Many-to-Many relationship with each other, e.g. each User can join many Channels and each Channel can have many Users.
Number of Channel objects - few hundreds. Number of User objects - tens of thousands.
Task formulation: Given a particular Channel object, I must find a User which is unrelated to it, as fast as possible.
Question:
1) What would be the optimal realisation of such a many-to-many relationship, considering the given task? Is there any specific algorithm for such a problem (other than straightforward iteration through all relations)?
2) How to implement this, if relations are supposed to have some additional properties, e.g. store the time when User joined Channel?
My thoughts: The first idea was create some additional class like
class UserChannelRel {
private:
User* userPtr;
Channel* chPtr;
float timeJoined;
public:
UserChannelRel(User* user, Channel* channel, float tm) {
this->userPtr = user;
this->chPtr = channel;
this->timeJoined = tm;
}
};
And store those in some big standard container (vector?). But then iterating through all the elements seems pretty slow.
First you can create two repositories to hold the full list of users on one side and the full list of channels on the other. Typically, you'd do this with maps:
map<int, User> users;
map<std::string, Channel> channels;
Then, I'd propose to have for each channel a set of users:
class Channel {
private:
std::string name;
std::set<int> subscribers;
public:
Channel(std::string name):name(name) { };
void add(int userid) {
subscribers.insert(userid);
}
};
Then to find users not related to a channel, you could iterate through the users and easily check if included in the channel.
Alternatively, you could also use a global set of users (either maintaining set membership in same time than the repository, or by creating the set from the map) and use set_difference() to generate the set of users who are not subscribers.
Example of use of set_difference:
set<int> a { 1,2,3,4,5,6,7}; // imagine a set of all the users
set<int> b{ 2,3,8}; // imagine a set of the subscribers of a channel
vector<int> c; // container with the users who are not subscribers
set_difference(a.begin(),a.end(), b.begin(), b.end(), back_inserter(c));
copy(c.begin(), c.end(), ostream_iterator<int>(cout," "));
How to chose between the two approaches ? The first approach, iterating and checking, has the advantage to quickly find first users and start doing something with the proposal. The iteration can ba optimized, by making use of the fact that sets and maps are sorted. You don't need to find all the users. The second approach is elegant, but with large user base, it could take more time since you need to have the full result before doing anything.
Maybe you could try using std::shared_ptr?
Each User would have a set of shared_ptr to the Channels they join which saves memory since one Channel can be joined by multiple Users.
You can do the same to store the Users in your Channels.
And then, you can have your Users in a vector and use a sort (an efficient one like merge maybe, try benchmarking the existing sorts that are out there) that looks if the User has a pointer the Channel you're looking at.

Access Key from Values and Value from Key

My project needs both accessors.
Access Value using Key (Simple)
Access Key using Value (Bit tricky)
Value too will be unique in my project
Please suggest the better container to use and how ?
I would like to use either the STL or BOOST.
What you're looking for is called a bidirectional map.
There isn't one in the STL, but you can take a look at Boost.Bimap for another implementation.
If you want to implement it yourself, you can simply use two regular one-way maps. If you use pointers, there should be little memory overhead and decent performance.
That's what I used in my project two days ago.
#include <boost/bimap.hpp>
class ClientManager
{
typedef boost::bimap<
boost::bimaps::set_of<int>,
boost::bimaps::set_of<int>
> ConnectedUsers; // User Id, Instance Id
ConnectedUsers m_connectedUsers;
public:
int getUserId(int instanceId);
int getInstanceId(int userId);
};
int ClientManager::getInstanceId(int userId)
{
auto it = m_connectedUsers.left.find(userId);
return it->second;
}
int ClientManager::getUserId(int instanceId)
{
auto it = m_connectedUsers.right.find(instanceId);
return it->second;
}
...
// Insert
m_connectedUsers.insert(ConnectedUsers::value_type(id, instanceId));
// Erase
m_connectedUsers.left.erase(userId);
If you want the either way access, ie key->value and value->key, chances are that your design doesn't need an associative container like a map
Try a vector of std::pair.
On a side note, if you need to store more than two values, you can use std::tuple.
HTH!!

Data Structure that stores object with names

I want to store object that are given a certain name.
I wanted to use struct and then store them in a vector, but it was suggested to me that I should rather use a different data structure, a little more simple, but I cant seem to find one.
My current ("complex") solution:
//in header file
struct objStorage{
Classname obj;
string name;
};
vector<objStorage> vec;
//in constructor
objStorage firstObj;
firstObj.obj = new Classname();
firstObj.name = "foo";
vec.push_back(firstObj);
Is there a more simple solution (Data structure)?
I should add that I don't need the structure once I stored (copied?) it in the vector, because this is all happening in another class (constructor) so I don't want any problems when calling the constructor multiple times.
If you want to lookup items by some key, for example a string, the classic thing to use is a map:
std::map<std::string, Classname> items;
std::pair<std::map<std::string, Classname>::iterator, bool> inserted =
items.insert(std::make_pair(std::string("foo"), Classname()));
items["bar"] = Classname();
In this set up, if you really think you want to use pointers, you should consider some form of smart pointer.
There are other options, for example, C++11 introduces other lookup structures - e.g. unordered maps.

In a hashmap/unordered_map, is it possible to avoid data duplication when the value already contains the key

Given the following code:
struct Item
{
std::string name;
int someInt;
string someString;
Item(const std::string& aName):name(aName){}
};
std::unordered_map<std::string, Item*> items;
Item* item = new Item("testitem");
items.insert(make_pair(item.name, item);
The item name will be stored in memory two times - once as part of the Item struct and once as the key of the map entry. Is it possible to avoid the duplication? With some 100M records this overhead becomes huge.
Note:
I need to have the name inside the Item structure because I use the hashmap as index to another container of Item-s, and there I don't have access to the map's key values.
OK, since you say you are using pointers as values, I hereby bring my answer back to life.
A bit hacky, but should work. Basicly you use pointer and a custom hash function
struct Item
{
std::string name;
int someInt;
string someString;
Item(const std::string& aName):name(aName){}
struct name_hash
{
size_t operator() (std::string* name)
{
std::hash<std::string> h;
return h(*name);
}
};
};
std::unordered_map<std::string*, Item*, Item::name_hash> items;
Item* item = new Item ("testitem");
items.insert(make_pair(&(item->name), item);
Assuming the structure you use to store your items in the first place is a simple list, you could replace it with a multi-indexed container.
Something along thoses lines (untested) should fulfill your requirements:
typedef multi_index_container<
Item,
indexed_by<
sequenced<>,
hashed_unique<member<Item, std::string, &Item::name
>
> itemContainer;
itemContainer items;
Now you can access items either in their order of insertion, or look them up by name:
itemContainer::nth_index<0>::type & sequentialItems = items.get<O>();
// use sequentialItems as a regular std::list
itemContainer::nth_index<1>::type & associativeItems = items.get<1>();
// uses associativeItems as a regular std::unordered_set
Depending on your needs, you can use other indexings as well.
Don't store std::string name field in your struct. Anyway when you perform lookup you already know name field.
TL;DR If you are using libstdc++ (coming with gcc) you are already fine.
There are 3 ways, 2 are "simple":
split your object in two Key/Value, and stop duplicated the Key in the Value
store your object in a unordered_set instead
The 3rd one is more complicated, unless provided by your compiler:
use an implementation of std::string that is reference counted (such as libstdc++'s)
In this case, when you copy a std::string into another, the reference counter of the internal buffer is incremented... and that's all. Copy is deferred to a time where a modification is requested by one of the owners: Copy On Write.
No, there isn't. You can:
Not store name in Item and pass it around separately.
Create Item, ItemData that has the same fields as Item except the name and either
derive Item from std::pair<std::string, ItemData> (= value_type of the type) or
make it convertible to and from that type.
Use a reference to string for the key. You should be able to use std::reference_wrapper<const std::string> as key and pass key in std::cref(value.name) for key and std::cref(std::string(whatever)) for searching. You may have to specialize std::hash<std::reference_wrapper<const std::string>>, but it should be easy.
Use std::unordered_set, but it has the disadvantage that lookup creates dummy Item for lookup.
When you actually have Item * as value type, you can move the name to a base class and use polymorphism to avoid that disadvantage.
Create custom hash map, e.g. with Boost.Intrusive.

Dynamically storing an internal configuration

I've been thinking of The Right Way (R) to store my program's internal configuration.
Here's the details:
The configuration is runtime only, so generated each run.
It can be adapted (and should) through directives in a "project" file (the reading of that file is not in the scope of this question)
It needs to be extensible, ie there should be a way to add new "variables" with assignes values.
My questions about this:
How should I begin with this? Is a
class with accessors and setters
with an internal std::map for
custom variables a good option?
Are there any known and "good" ways
of doing this?
Should there be a difference between
integer, boolean and string
configuration variables?
Should there be a difference at all
between user and built-in
(pre-existing as in I already
thought of them) variables?
Thanks!
PS: If the question isn't clear, feel free to ask for more info.
UPDATE: Wow, every answer seems to have implicitely or explicitly used boost. I should have mentioned I'd like to avoid boost (I want to explore the Standard libraries' capabilities as is for now).
You could use Boost.PropertyTree for this.
Property trees are versatile data
structures, but are particularly
suited for holding configuration data.
The tree provides its own,
tree-specific interface, and each node
is also an STL-compatible Sequence for
its child nodes.
You could do worse than some kind of a property map (StringMap is just a typedef'd std::map)
class PropertyMap
{
private:
StringMap m_Map;
public:
PropertyMap() { };
~PropertyMap() { };
// properties
template<class T>
T get(const String& _key, const T& _default = T()) const
{
StringMap_cit cit(m_Map.find(_key));
return (cit != m_Map.end()) ? boost::lexical_cast<T>(cit->second) : _default;
}; // eo get
// methods
void set(const String& _cap, const String& _value)
{
m_Map[_cap] = _value;
}; // eo set
template<class T>
void set(const String& _key, const T& _val)
{
set(_key, boost::lexical_cast<String>(_val));
}; // eo set
};
It is very useful to support nesting in configuration files. Something like JSON.
As parameter values can be scalars, arrays and nested groups of parameters, it could be stored in a std::map of boost::variant's, whose value can be a scalar, array or other std::map recursively. Note that std::map sorts by name, so if the original config file order of parameters is important there should be a sequential index of parameters as well. This can be achieved by using boost::multi_index with an ordered or hashed index for fast lookup and a sequential index for traversing the parameters in the original config file order.
I haven't checked, that boost property map could do that from what I've heard.
It is possible to store all values as strings (or arrays of strings for array values) converting them to the destination type only when accessing it.