How to structure lookup tables? - c++

I am implementing lookup tables.
There should be one index column, which can be int, double, or string. If it is int or double, when looking up a value, it should lookup which range it falls int (so if the indices are 0, 5, 10, etc, and the lookup value is 9, it should return the content corresponding to index 5). If it's a string, it only looks for exact matches.
The content is one or more columns, each of which may (independently of each other and the index) be int, double, or string as well.
The data for the tables should exist in a separate file (preferably one for all tables, regardless of type).
The tables should then be stored in a "semi-dynamic" way (ie, at run-time they will be fix, but I need to be able to update, add or modify the tables between runs).
I figured I need to create a base class of table, which go in a map, regardless of what they contain and what they return. The key is the name of the table, and is read from the file. After that, I'm a bit stumped as to how to proceed.
I first had the idea of creating some sort of template, such as template <class INDEX, class CONTENT, int COL, int ROW> and then define array<INDEX>[ROW] and array<CONTENT>[COL][ROW]. But when I tried that, I couldn't put them in the same map (because the instantiations are interpreted as different classes and thus incompatible to one type?).
Then I figured I should probably use an inheritance class structure, with the base class in the map, and then subclass for each type of table. However, if that's the case, I'd need a lot of classes to cover all the combinations of index and content columns being one of the three data types. So I'd need to generalise a bit more, and then I thought "why don't I store all the data as string and keep an array<int> (or even enum?) to keep track of which of the cases each column is (ie int, double, or string)?" That would be a possible solution, as far as I see, although it would be quite ugly and inefficient. Inefficiency isn't a huge problem, since the tables aren't going to be that many or that large, so storing strings and casting to int/double when needed would be viable, but I'd like to do it properly anyway. Also, there's the issue of having the table class needing all the different versions of lookup (ie that takes in and returns int, double, and/or string), so I might end up calling the wrong function on the object (ie call for an int when the table consists of text that cannot be cast from string.). Again, that's not a huge problem, apart from that it's ugly and probably not the way to do it.
I could also have "empty arrays" of each of the type I'm not using, and keep track of which has data with another array<int> (or enum), and maybe a map<int, array<TYPE> >, where TYPE is the type of each column, and then figure out based on which column I'm looking for what array contains what I have saved/need. But that feels even more convoluted and awkward (but maybe I'm wrong).
Alternatively, I could, of course, start splitting up the tables into smaller ones (ie maybe all have one index and only one content column, or even a structure of "index columns" connected to one or more "content columns"). But then I'd need to figure out a neat way of doing that, which I haven't been able to so far.
Anyway, the data file might look like:
Age Table
0;Kid
18;Young Adult
30;Middle-Age
65;Retired
Average Living Space Per District
Kid;10;14;12
Young Adult;20;30;35
Middle-Age;30;50;50
Retired;50;60;55
{etc}
Does anyone have any hints or advice as to how to tackle this problem?

Related

Creating a generalized resource map without using strings?

Let's assume I want to create an object that will hold some arbitrary data.
// Pseudocode
class MyContainer {
map<key, pair<void*, size>>;
}
The key in this case also identifies the kind of data stored in the void* (e.g an image, a struct of some kind, maybe even a function).
The most general way to del with this is have the key be a string. Then you can put whatever on earth you want and then you can just read it. As a silly example, the key can just be:
"I am storing a png image and the source file was at location/assets/image.png and today is sunday".
i.e you can encode whatever you want. This is however slow. A much faster alternative is using enumerators and your keys are then IMAGE, HASHMAP, FUNCTION, THE_ANSWER_TO_LIFE...
However that requires you know every single case you need to handle beforehand and create an enumerator for it manually (which is tedious and not very extensible).
Is there a compromise that can be made? i.e something that uses only one key but is faster than strings and more extensible than enums?
Edit:
The exact use case I am trying to use this for is as a generalized storage for rendering data. This includes images, vertex buffers, volumetric data, lighting information... Or any other conceivable thing you may need.
The only way I know to create "absolute polymorphism" (i.e represent literally any form of conceivable data) is to use void pointers and rely on algorithms to understand the data.
Example:
Say our key is a JSON string where the key of each element is the name of a field in a compact struct and the value is the offset in bytes.
E.g
{
m_field1: 0,
m_field2: 32,
m_field3: 128,
}
Then to access any of the elements in the void* all you need to do is do symbol manipulation to get the number and then ptr + offset.
You can do the same with a set of unique identifiers (enums) and associated functions that get you the fields based on the identifier (hard coded approach).
Hopefully this makes the question less obscure.

Storing named data, where the 'name' is larger than the 'data'?

I'm writing the logic portion of a game, and want to create, retrieve, and store values (integers) to keep track of progress. For instance, a door would create the pair ("location.room.doorlock", 0) in an std::map, and unlocking this door would set that value to 1. Anytime the player wants to go through this door, it would retrieve the value by that keyname to see if it's passable. (Just an example, but it's important that this information exist outside of the "door" object itself, as characters or other events might retrieve this data and act on it.)
The problem though is that the name (or map key) itself is far larger than the data it's referring to, which seems wasteful, and feels 'wrong' as a result.
Is there a commonly used or best approach for storing this type of data, one where the key isn't so much larger than the data itself?
It is possible to know how much space to allocate at compile time for the progress data itself, if it's important. It need not use std::map either, so long as I don't have to use raw array indices to get or store data.
It seems like you have two options, if you really want to diminish the size of the string (although the string length does not seem to be that bad at all).
You can either just change your naming conventions or implement hashing. Hashing can be implemented in the form of a hashmap (also known as an unordered map) or by hand (you can create a small program that hashes your names to an int, then use that as a pair). Hashmaps/unordered maps are probably your best bet, as there is a lot of support code out there for it and you don't run the risk of having to deal with bugs in your own programs.
http://www.cplusplus.com/reference/unordered_map/unordered_map/

Hierarchical filtered lookup in C++

I have been pondering a data structure problem for a while, but can't seem to come up with a good solution. I can not shake off the feeling that the solution is simple and I'm just not seeing it, however, so hopefully you guys can help!
Here is the problem: I have a large collection of objects in memory. Each of them has a number of data fields. Some of the data fields, such as an ID, are unique for each objects, but others, such as a name, can appear in multiple objects.
class Object {
size_t id;
std::string name;
Histogram histogram;
Type type;
...
};
I need to organize these objects in a way that will allow me to quickly (even if the number of objects is relatively large, i.e. millions) filter the collection given a specification of an arbitrary number of object members while all members that are left unspecified count as wildcards. For example, if I specify a given name, I want to retrieve all the objects whose name member equals the given name. However, if I then add a histogram to the query, I would like the query to return only the objects that match in both the name and the histogram fields, and so on. So, for example, I'd like a function
std::set<Object*> retrieve(size_t, std::string, Histogram, Type)
that can both do
retrieve(42, WILDCARD, WILDCARD, WILDCARD)
as well as
retrieve(42, WILDCARD, WILDCARD, Type_foo)
where the second call would return fewer or equally as many objects as the first one. Which data structure allows queries like this and can both be constructed and queried in reasonable time for object counts in the millions?
Thanks for the help!
First you could use Boost Multi-index to implement efficent lookup over differnt members of your Object. This could help to limit the number of elements to consider. As a second step you can simply use a lambda expression to implement a predicate for std::find_if to get first element or use std::copy_if to copy all elements to an target sequence. If you decide to use boost you can use Boost Range with filtering.

Data structures to implement unknown table schema in c/c++?

Our task is to read information about table schema from a file, implement that table in c/c++ and then successfully run some "select" queries on it. The table schema file may have contents like this,
Tablename- Student
"ID","int(11)","NO","PRIMARY","0","".
Now, my question is what data structures would be appropriate for the task. The problem is that I do not know the number of columns a table might have, neither as to what might the name of those columns be nor any idea about their data types. For example, a table might have just one column of type int, another might have 15 columns of varying data types. Infact, I don't even know the number of tables whose description the schema file might have.
One way I thought of was to have a set number of say, 20 vectors (assuming that the upper limit of the columns in a table is 20), name those vectors 1stvector, 2ndvector and so on, map the name of the columns to the vectors, and then use them accordingly. But it seems the code for it would be a mess with all those if/else statements or switch case statements (for the mapping).
While googling/stack-overflowing, I learned that you can't describe a class at runtime otherwise the problem might have been easier to solve.
Any help is appreciated.
Thanks.
As a C++ data structure, you could try a std::vector< std::vector<boost::any> >. A vector is part of the Standard Library and allows dynamic rescaling of the number of elements. A vector of vectors would imply an arbitrary number of rows with an arbitray number of columns. Boost.Any is not part of the Standard Library but widely available and allows storing arbitrary types.
I am not aware of any good C++ library to do SQL queries on that data structure. You might need to write your own. E.g. the SQL commands select and where would correspond to the STL algorithm std::find_if with an appropriate predicate passed as a function object.
To deal with the lack of knowledge about the data column types you almost have to store the raw input (i.e. strings which suggests std:string) and coerce the interpretation as needed later on.
This also has the advantage that the column names can be stored in the same type.
If you realy want to determine the column type you'll need to speculatively parse each column of input to see what it could be and make decisions on that basis.
Either way if the input could contain a column that has the column separation symbol in it (say a string including a space in otherwise white space separated data) you will have to know the quoting convention of the input and write a parses of some kind to work on the data (sucking whole lines in with getline is your friend here). Your input appears to be comma separated with double quote deliminated strings.
I suggest using std::vector to hold all the table creation statements. After all the creation statements are read in, you can construct your table.
The problem to overcome is the plethora of column types. All the C++ containers like to have a uniform type, such as std::vector<std::string>. You will have different column types.
One solution is to have your data types descend from a single base. That would allow you to have std::vector<Base *> for each row of the table, where the pointers can point to fields of different {child} types.
I'll leave the rest up to the OP to figure out.

Is std::map a good solution?

All,
I have following task.
I have finite number of strings (categories). Then in each category there will be a set of team and the value pairs. The number of team is finite based on the user selection.
Both sizes are not more than 25.
Now the value will change based on the user input and when it change the team should be sorted based on the value.
I was hoping that STL has some kind of auto sorted vector or list container, but the only thing I could find is std::map<>.
So what I think I need is:
struct Foo
{
std::string team;
double value;
operator<();
};
std::map<std::string,std::vector<Foo>> myContainer;
and just call std::sort() when the value will change.
Or is there more efficient way to do it?
[EDIT]
I guess I need to clarify what I mean.
Think about it this way.
You have a table. The rows of this table are teams. The columns of this table are categories. The cells of this table are divided in half. Top half is the category value for a given team. This value is increasing with every player.
Now when the player is added to a team, the scoring categories of the player will be added to a team and the data in the columns will be sorted. So, for category "A" it may be team1, team2; and for category "B" it may be team2, team1.
Then based on the position of each team the score will be assigned for each team/category.
And that score I will need to display.
I hope this will clarify what I am trying to achieve and it become more clear of what I'm looking for.
[/EDIT]
It really depend how often you are going to modify the data in the map and how often you're just going to be searching for the std::string and grabbing the vector.
If your access pattern is add map entry then fill all entries in the vector then access the next, fill all entries in the vector, etc. Then randomly access the map for the vector afterwards then .. no map is probably not the best container. You'd be better off using a vector containing a standard pair of the string and the vector, then sort it once everything has been added.
In fact organising it as above is probably the most efficient way of setting it up (I admit this is not always possible however). Furthermore it would be highly advisable to use some sort of hash value in place of the std::string as a hash compare is many times faster than a string compare. You also have the string stored in Foo anyway.
map will, however, work but it really depends on exactly what you are trying to do.