Storing named data, where the 'name' is larger than the 'data'? - c++

I'm writing the logic portion of a game, and want to create, retrieve, and store values (integers) to keep track of progress. For instance, a door would create the pair ("location.room.doorlock", 0) in an std::map, and unlocking this door would set that value to 1. Anytime the player wants to go through this door, it would retrieve the value by that keyname to see if it's passable. (Just an example, but it's important that this information exist outside of the "door" object itself, as characters or other events might retrieve this data and act on it.)
The problem though is that the name (or map key) itself is far larger than the data it's referring to, which seems wasteful, and feels 'wrong' as a result.
Is there a commonly used or best approach for storing this type of data, one where the key isn't so much larger than the data itself?
It is possible to know how much space to allocate at compile time for the progress data itself, if it's important. It need not use std::map either, so long as I don't have to use raw array indices to get or store data.

It seems like you have two options, if you really want to diminish the size of the string (although the string length does not seem to be that bad at all).
You can either just change your naming conventions or implement hashing. Hashing can be implemented in the form of a hashmap (also known as an unordered map) or by hand (you can create a small program that hashes your names to an int, then use that as a pair). Hashmaps/unordered maps are probably your best bet, as there is a lot of support code out there for it and you don't run the risk of having to deal with bugs in your own programs.
http://www.cplusplus.com/reference/unordered_map/unordered_map/

Related

Creating a generalized resource map without using strings?

Let's assume I want to create an object that will hold some arbitrary data.
// Pseudocode
class MyContainer {
map<key, pair<void*, size>>;
}
The key in this case also identifies the kind of data stored in the void* (e.g an image, a struct of some kind, maybe even a function).
The most general way to del with this is have the key be a string. Then you can put whatever on earth you want and then you can just read it. As a silly example, the key can just be:
"I am storing a png image and the source file was at location/assets/image.png and today is sunday".
i.e you can encode whatever you want. This is however slow. A much faster alternative is using enumerators and your keys are then IMAGE, HASHMAP, FUNCTION, THE_ANSWER_TO_LIFE...
However that requires you know every single case you need to handle beforehand and create an enumerator for it manually (which is tedious and not very extensible).
Is there a compromise that can be made? i.e something that uses only one key but is faster than strings and more extensible than enums?
Edit:
The exact use case I am trying to use this for is as a generalized storage for rendering data. This includes images, vertex buffers, volumetric data, lighting information... Or any other conceivable thing you may need.
The only way I know to create "absolute polymorphism" (i.e represent literally any form of conceivable data) is to use void pointers and rely on algorithms to understand the data.
Example:
Say our key is a JSON string where the key of each element is the name of a field in a compact struct and the value is the offset in bytes.
E.g
{
m_field1: 0,
m_field2: 32,
m_field3: 128,
}
Then to access any of the elements in the void* all you need to do is do symbol manipulation to get the number and then ptr + offset.
You can do the same with a set of unique identifiers (enums) and associated functions that get you the fields based on the identifier (hard coded approach).
Hopefully this makes the question less obscure.

How to find largest values in C++ Map

My teacher in my Data Structures class gave us an assignment to read in a book and count how many words there are. Thats not all; we need to display the 100 most common words. My gut says to sort the map, but I only need 100 words from the map. After googling around, is there a "Textbook Answer" to sorting maps by the value and not the key?
I doubt there's a "Textbook Answer", and the answer is no: you can't sort maps by value.
You could always create another map using the values. However, this is not the most efficient solution. What I think would be better is for you to chuck the values into a priority_queue, and then pop the first 100 off.
Note that you don't need to store the words in the second data structure. You can store pointers or references to the word, or even a map::iterator.
Now, there's another approach you could consider. That is to maintain a running order of the top 100 candidates as you build your first map. That way there would be no need to do the second pass and build an extra structure which, as you pointed out, is wasteful.
To do this efficiently you would probably use a heap-like approach and do a bubble-up whenever you update a value. Since the word counts only ever increase, this would suit the heap very nicely. However, you would have a maintenance issue on your hands. That is: how you reference the position of a value in the heap, and keeping track of values that fall off the bottom.

What are some good methods to replace string names with integer hashes

Usually, entities and components or other parts of the game code in data-driven design will have names that get checked if you want to find out which object you're dealing with exactly.
void Player::Interact(Entity *myEntity)
{
if(myEntity->isNearEnough(this) && myEntity->GetFamilyName() == "guard")
{
static_cast<Guard*>(myEntity)->Say("No mention of arrows and knees here");
}
}
If you ignore the possibility that this might be premature optimization, it's pretty clear that looking up entities would be a lot faster if their "name" was a simple 32 bit value instead of an actual string.
Computing hashes out of the string names is one possible option. I haven't actually tried it, but with a range of 32bit and a good hashing function the risk of collision should be minimal.
The question is this: Obviously we need some way to convert in-code (or in some kind of external file) string-names to those integers, since the person working on these named objects will still want to refer to the object as "guard" instead of "0x2315f21a".
Assuming we're using C++ and want to replace all strings that appear in the code, can this even be achieved with language-built in features or do we have to build an external tool that manually looks through all files and exchanges the values?
Jason Gregory wrote this on his book :
At Naughty Dog, we used a variant of the CRC-32 algorithm to hash our strings, and we didn't encounter a single collision in over two years of development on Uncharted: Drake's Fortune.
So you may want to look into that.
And about the build step you mentioned, he also talked about it. They basically encapsulate the strings that need to be hashed in something like:
_ID("string literal")
And use an external tool at build time to hash all the occurrences. This way you avoid any runtime costs.
This is what enums are for. I wouldn't dare to decide which resource is best for the topic, but there are plenty to choose from: https://www.google.com/search?q=c%2B%2B+enum
I'd say go with enums!
But if you already have a lot of code already using strings, well, either just keep it that way (simple and usually enough fast on a PC anyway) or hash it using some kind of CRC or MD5 into an integer.
This is basically solved by adding an indirection on top of a hash map.
Say you want to convert strings to integers:
Write a class wraps both an array and a hashmap. I call these classes dictionaries.
The array contains the strings.
The hash map's key is the string (shared pointers or stable arrays where raw pointers are safe work as well)
The hash map's value is the index into the array the string is located, which is also the opaque handle it returns to calling code.
When adding a new string to the system, it is searched for already existing in the hashmap, returns the handle if present.
If the handle is not present, add the string to the array, the index is the handle.
Set the string and the handle in the map, and return the handle.
Notes/Caveats:
This strategy makes getting the string back from the handle run in constant time (it is merely an array deference).
handle identifiers are first come first serve, but if you serialize the strings instead of the values it won't matter.
Operator[] overloads for both the key and the value are fairly simple (registering new strings, or getting the string back), but wrapping the handle with a user-defined class (wrapping an integer) adds a lot of much needed type safety, and also avoids ambiguity if you want the key and the values to be the same types (overloaded[]'s wont compile and etc)
You have to store the strings in RAM, which can be a problem.

An integer hashing problem

I have a (C++) std::map<int, MyObject*> that contains a couple of millions of objects of type MyObject*. The maximum number of objects that I can have, is around 100 millions. The key is the object's id. During a certain process, these objects must be somehow marked( with a 0 or 1) as fast as possible. The marking cannot happen on the objects themselves (so I cannot introduce a member variable and use that for the marking process). Since I know the minimum and maximum id (1 to 100_000_000), the first thought that occured to me, was to use a std::bit_set<100000000> and perform my marking there. This solves my problem and also makes it easier when marking processes run in parallel, since these use their own bit_set to mark things, but I was wondering what the solution could be, if I had to use something else instead of a 0-1 marking, e.g what could I use if I had to mark all objects with an integer number ?
Is there some form of a data structure that can deal with this kind of problem in a compact (memory-wise) manner, and also be fast ? The main queries of interest are whether an object is marked, and with what was marked with.
Thank you.
Note: std::map<int, MyObject*> cannot be changed. Whatever data structure I use, must not deal with the map itself.
How about making the value_type of your map a std::pair<bool, MyObject*> instead of MyObject*?
If you're not concerned with memory, then a std::vector<int> (or whatever suits your need in place of an int) should work.
If you don't like that, and you can't modify your map, then why not create a parallel map for the markers?
std::map<id,T> my_object_map;
std::map<id,int> my_marker_map;
If you cannot modify the objects directly, have you considered wrapping the objects before you place them in the map? e.g.:
struct
{
int marker;
T *p_x;
} T_wrapper;
std::map<int,T_wrapper> my_map;
If you're going to need to do lookups anyway, then this will be no slower.
EDIT: As #tenfour suggests in his/her answer, a std::pair may be a cleaner solution here, as it saves the struct definition. Personally, I'm not a big fan of std::pairs, because you have to refer to everything as first and second, rather than by meaningful names. But that's just me...
The most important question to ask yourself is "How many of these 100,000,000 objects might be marked (or remain unmarked)?" If the answer is smaller than roughly 100,000,000/(2*sizeof(int)), then just use another std::set or std::tr1::unordered_set (hash_set previous to tr1) to track which ones are so marked (or remained unmarked).
Where does 2*sizeof(int) come from? It's an estimate of the amount of memory overhead to maintain a heap structure in a deque of the list of items that will be marked.
If it is larger, then use std::bitset as you were about to use. It's overhead is effectively 0% for the scale of quantity you need. You'll need about 13 megabytes of contiguous ram to hold the bitset.
If you need to store a marking as well as presence, then use std::tr1::unordered_map using the key of Object* and value of marker_type. And again, if the percentage of marked nodes is higher than the aforementioned comparison, then you'll want to use some sort of bitset to hold the number of bits needed, with suitable adjustments in size, at 12.5 megabytes per bit.
A purpose-built object holding the bitset might be your best choice, given the clarification of the requirements.
Edit: this assumes that you've done proper time-complexity computations for what are acceptable solutions to you, since changing the base std::map structure is no longer permitted.
If you don't mind using hacks, take a look at the memory optimization used in Boost.MultiIndex. It can store one bit in the LSB of a stored pointer.

C++ map really slow?

i've created a dll for gamemaker. dll's arrays where really slow so after asking around a bit i learnt i could use maps in c++ and make a dll.
anyway, ill represent what i need to store in a 3d array:
information[id][number][number]
the id corresponds to an objects id. the first number field ranges from 0 - 3 and each number represents a different setting. the 2nd number field represents the value for the setting in number field 1.
so..
information[101][1][4];
information[101][2][4];
information[101][3][4];
this would translate to "object with id 101 has a value of 4 for settings 1, 2 and 3".
i did this to try and copy it with maps:
//declared as a class member
map<double, map<int, double>*> objIdMap;
///// lower down the page, in some function
map<int, double> objSettingsMap;
objSettingsMap[1] = 4;
objSettingsMap[2] = 4;
objSettingsMap[3] = 4;
map<int, double>* temp = &objSettingsMap;
objIdMap[id] = temp;
so the first map, objIdMap stores the id as the key, and a pointer to another map which stores the number representing the setting as the key, and the value of the setting as the value.
however, this is for a game, so new objects with their own id's and settings might need to be stored (sometimes a hundred or so new ones every few seconds), and the existing ones constantly need to retrieve the values for every step of the game. are maps not able to handle this? i has a very similar thing going with game maker's array's and it worked fine.
Do not use double's as a the key of a map.
Try to use a floating point comparison function if you want to compare two doubles.
1) Your code is buggy: You store a pointer to a local object objSettingsMap which will be destroyed as soon as it goes out of scope. You must store a map obj, not a pointer to it, so the local map will be copied into this object.
2) Maps can become arbitrarily large (i have maps with millions of entrys). If you need speed try hash_maps (part of C++0x, but also available from other sources), which are considerably faster. But adding some hundred entries each second shouldn't be a problem. But befre worring about execution speed you should always use a profiler.
3) I am not really sure if your nested structures MUST be maps. Depending of what number of setting you have, and what values they may have, a structure or bitfield or a vector might be more accurate.
If you need really fast associative containers, try to learn about hashes. Maps are 'fast enough' but not brilliant for some cases.
Try to analyze what is the structure of objects you need to store. If the fields are fixed I'd recommend not to use nested maps. At all. Maps are usually intended for 'average' number of indexes. For low number simple lists are more effective because of insert / erase operations lower complexity. For great number of indexes you really need to think about hashing.
Don't forget about memory. std::map is highly dynamic template so on small objects stored you loose tons of memory because of dynamic allocation. Is it what you are really expecting? Once I was involved in std::map usage removal which lowered memory requirements in about 2 times.
If you only need to fill the map at startup and only search for elements (don't need to change structure) I'd recommend simple std::vector with sort applied after all the elems inserted. And then you can just use binary search (as you have sorted vector). Why? std::vector is much more predictable thing. The biggest advantage is continuous memory area.