Fastest way to "resurrect" (serialize/ and deserialize) an std::map - c++

As part of my test code I need to build complex structure that uses among other stuff 2 std::map instances; both of them have around 1 million elements. In optimized builds it's OK, however, in debug un-optimized builds it takes almost a minute. I use the same data to build that map, basically if I could save chunk of ram and restore it in 20ms then I'd effectively get the same map in my app without waiting a minute every time. What can i do to speed it up? I could try to use custom allocator and save/restore its alloacted storage, or is there a way to construct std::map from data that's already sorted perhaps so that it would be linear in time?

The technical difficulty, is that for std::map in debug mode, the Visual studio compiler inserts checks for correctness, and in some revisions, has inserted elements into the structure to ensure checking is easier.
There are 2 possible solutions :-
Abstraction
If the information provided by the std::map is replaceable by an interface class, then the internals of the std::map can be hidden and moved into a separate compilation unit. This can be compiled outside of the debug environment and performance restored.
alternative data structure
For a piece of information which is broadly static (e.g. a static piece of data you need to retrieve quickly, then std::map is not the fastest way to achieve this, and a sorted std::vector of std::pair<key,value> would be more performant in operation.
The advantage of the std::vector, is there are guarantees about its layout. If the data is Plain-old-data, then it can be loaded by a std::vector::reserve and a memcpy. Otherwise, filling the elements in the std::vector would still avoid the significant time spent by Visual Studio tracking the memory and structure of the std::map for issues.

Eventually after trying different approaches I ended up using custom allocator.
std::map was one of many containers that's used by my struct to hold the data. Total size of allocated ram was actually around 400MB, the struct contained lists, maps, vectors of different data where many of the members of these containers were pointers into other containers. So, I took radical approach and made it extremely fast to 'resurrect' my entire structure with all the maps and internal pointers. While originally in my post it was about making it 'fast' in debug builds quickly after modifying code and adding extra complexity it became equally applicable to release builds as well: construction time became around 10 seconds in release build.
So, at first I modified all structure members to use my custom allocator, this way I saw how much ram was actually allocated:
total allocated: 1970339320 bytes, total freed: 1437565512 bytes
This way I could estimate that I'll need around 600MB total. Then, in my custom allocator I added static global method my_aloc::start_recording() this method would allocate that 600MB chunk of ram and after start_recording was called my custom allocator would simply return addresses from that 600MB block. After start_recording was called I'd make a copy of my entire structure (it was actually a vector of structures). When making a copy there is no overallocation, each structure member allocated only enough ram for its storage. Basically by copying structs it actually allocated only around 400MB instead of 600MB.
I said that internally in my construct there is lots of pointers to internal members, how to reuse then this 400MB recorded "snapshot" from my custom allocator? I could write code to "patch" pointers, but perhaps it wouldn't even work: I had lots of maps that also use pointers as key with custom compare struct that dereferences pointers to compare actual pointer to values. Also, some maps contained iterators into lists, it would be quite messy to deal with all that. Also, my overall structure isn't set in stone, it's work in progress and if something gets changed then patching code would also need to be changed. So, the answer was quit obvious: I simply needed to load that entire 400MB snapshot at the same base address. In windows I use VirtualAlloc, in linux something like mmap perhaps would need to be used, or alternatively boost shared mem lib could be used to make it more portable. At the end overall load time went down to 150ms, while in release I takes more than 10 seconds, and in debug builds it's already perhaps somewhere in minutes now.

Related

Why do we prefer vectors instead of arrays in C++? Because of core dump? [duplicate]

I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
the elements are specified at compile time, e.g. const char project[] = "Super Server";, const Colours colours[] = { Green, Yellow };
with C++11 it will be equally concise to initialise std::vectors with values
the number of elements is inherently fixed, e.g. const char* const bool_to_str[] = { "false", "true" };, Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (new[]) followed by serialised construction of objects
compiler-generated tables of const data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-local static variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)
In C++03, vectors created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default vector's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03 vectors to use shared memory like this even when specifying a custom allocator template parameter).
embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector (in code concision, readability or performance) then you're better off doing so: they've size(), checked random access via at(), iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end() is better than x + sizeof x / sizeof x[0] any day).
UPDATE: C++11 introduced a std::array<>, which avoids some of the costs of vectors - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
One of the best reasons to use a vector as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete.
You should also check out std::algorithm, which provides a lot of common algorithms for vector and other STL containers.
I have on a few occasions written code with vector that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array or a Blitz::Array would have been better than either of them.
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.
I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.
Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2]; If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];, the answer is clear: USE VECTOR!
It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a good program that it can accomodate a wide range of possible scenarios.
Take these (trivial) scenarios as examples:
You have implemented a view
controller to track AI combatants in
a FPS. The game logic spawns a random
number of combatants in various zones
every couple of seconds. The player
is downing AI combatants at a rate
known only at run time.
A lawyer has accessed the Municipal
Court website in his state and is
querying the number of new DUI cases
that came in over the night. He
chooses to filter the list by a set
of variables including time the
accident occurred, zip code, and
arresting officer.
The operating system needs to
maintain a list of memory addresses
in use by the various programs
running on it. The number of programs
and their memory usage changes in
unpredictable ways.
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.
As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects.
Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.

C++ Vector vs Array [duplicate]

I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
the elements are specified at compile time, e.g. const char project[] = "Super Server";, const Colours colours[] = { Green, Yellow };
with C++11 it will be equally concise to initialise std::vectors with values
the number of elements is inherently fixed, e.g. const char* const bool_to_str[] = { "false", "true" };, Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (new[]) followed by serialised construction of objects
compiler-generated tables of const data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-local static variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)
In C++03, vectors created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default vector's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03 vectors to use shared memory like this even when specifying a custom allocator template parameter).
embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector (in code concision, readability or performance) then you're better off doing so: they've size(), checked random access via at(), iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end() is better than x + sizeof x / sizeof x[0] any day).
UPDATE: C++11 introduced a std::array<>, which avoids some of the costs of vectors - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
One of the best reasons to use a vector as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete.
You should also check out std::algorithm, which provides a lot of common algorithms for vector and other STL containers.
I have on a few occasions written code with vector that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array or a Blitz::Array would have been better than either of them.
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.
I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.
Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2]; If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];, the answer is clear: USE VECTOR!
It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a good program that it can accomodate a wide range of possible scenarios.
Take these (trivial) scenarios as examples:
You have implemented a view
controller to track AI combatants in
a FPS. The game logic spawns a random
number of combatants in various zones
every couple of seconds. The player
is downing AI combatants at a rate
known only at run time.
A lawyer has accessed the Municipal
Court website in his state and is
querying the number of new DUI cases
that came in over the night. He
chooses to filter the list by a set
of variables including time the
accident occurred, zip code, and
arresting officer.
The operating system needs to
maintain a list of memory addresses
in use by the various programs
running on it. The number of programs
and their memory usage changes in
unpredictable ways.
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.
As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects.
Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.

Memory usage in C++ program

I wrote a program that needs to handle a very large data with the following libraries:
vector
boost::unordered_map
boost::unordered_multimap
So, I'm having memory problems (The program uses a LOT) and I was thinking maybe I can replace this libraries (with something that already exists or my own implementations):
So, three questions:
How much memory I'd save if I replace vector with a C array? Is it worth it?
Can someone explain how is the memory used in boost::unordered_map and boost::unordered_multimap in the current implementation? Like what's stored in order to achieve their performance.
Can you recommend me some libraries that outperform boost::unordered_map and boost::unordered_multimap in memory usage (But not something too slow) ?
std::vector is memory efficient. I don't know about the boost maps, but the Boost people usually know what they're doing, I doubt you'll save a lot of memory by creating your own variants.
You can do a few other things to help with memory issues:
Compile in 64 bit. Running out of memory in a 64 bit process is very hard.
You won't run out of memory, but memory might get swapped out. You should instead see if you need to load everything into memory at once, perhaps you can work on chunks of the data at a time.
As a side benefit, working on a chunk of the data at a time allows you to run your code in parallel.
With memory being so cheap nowadays, so that allocating 10GB of RAM is very simple, I guess your bottlenecks will be in the processing you do of the data, not of allocating the data.
These two articles explain the data structures underyling some common implementations of unordered associative containers:
Implementation of C++ unordered associative containers
Implementation of C++ unordered associative containers with duplicate elements
Even though there are some differences between implementations, they are modest --one word per element at most. If you go with minimum-overhead solutions such as sorted vectors, this would gain you 2-3 words per element, not even a 2x improvement if your objects are large. So, you'd probably be better off resorting to an environment with more memory or radically changing your approach by using a database or something.
If you have only one set of data and multiple ways of accessing it you can try to use boost::multi_index here is the documentation.
std::vector is basically a contiguous array, plus a few bytes of overhead. About the only way you'll improve with vector is by using a smaller element type. Can you store a short int instead of a regular int? If so, you can cut the vector memory down by half.
Are you perhaps using these containers to hold pointers to many objects on the heap? If so, you may have a lot of wasted space in the heap that could be saved by writing custom allocators, or by doing away with a pointer to a heap element altogether, and by storing a value type within the container.
Look within your class types. Consider all pointer types, and whether they need to be dynamic storage or not. The typical class often has pointer members hanging off a base object, which means a single object is a graph of memory chunks in itself. The more you can inline your class members, the more efficient your use of the heap will be.
RAM is cheap in 2014. Easy enough to build x86-64 Intel boxes with 64-256GB of RAM and Solid State Disk as fast swap if your current box isn't cutting it for the project. Hopefully this isn't a commercial desktop app we are discussing. :)
I ended up by changing boost::unordered_multimap for std::unordered_map of vector.
boost::unordered_multimap consumes more than two times the memory consumed by std::unordered_map of vector due the extra pointers it keeps (at least one extra pointer per element) and the fact that it stores the key and the value of each element, while unordered_map of vector only stores the key once for a vector that contains all the colliding elements.
In my particular case, I was trying to store about 4k million integers, consuming about 15 GB of memory in the ideal case. Using the multimap I get to consume more than 40 GB while using the map I use about 15 GB (a little bit more due the pointers and other structures but their size if despicable).

c++ Alternative implementation to avoid shifting between RAM and SWAP memory

I have a program, that uses dynamic programming to calculate some information. The problem is, that theoretically the used memory grows exponentially. Some filters that I use limit this space, but for a big input they also can't avoid that my program runs out of RAM - Memory.
The program is running on 4 threads. When I run it with a really big input I noticed, that at some point the program starts to use the swap memory, because my RAM is not big enough. The consequence of this is, that my CPU-usage decreases from about 380% to 15% or lower.
There is only one variable that uses the memory which is the following datastructure:
Edit (added type) with CLN library:
class My_Map {
typedef std::pair<double,short> key;
typedef cln::cl_I value;
public:
tbb::concurrent_hash_map<key,value>* map;
My_Map() { map = new tbb::concurrent_hash_map<myType>(); }
~My_Map() { delete map; }
//some functions for operations on the map
};
In my main program I am using this datastructure as globale variable:
My_Map* container = new My_Map();
Question:
Is there a way to avoid the shifting of memory between SWAP and RAM? I thought pushing all the memory to the Heap would help, but it seems not to. So I don't know if it is possible to maybe fully use the swap memory or something else. Just this shifting of memory cost much time. The CPU usage decreases dramatically.
If you have 1 Gig of RAM and you have a program that uses up 2 Gb RAM, then you're going to have to find somewhere else to store the excess data.. obviously. The default OS way is to swap but the alternative is to manage your own 'swapping' by using a memory-mapped file.
You open a file and allocate a virtual memory block in it, then you bring pages of the file into RAM to work on. The OS manages this for you for the most part, but you should think about your memory usage so not to try to keep access to the same blocks while they're in memory if you can.
On Windows you use CreateFileMapping(), on Linux you use mmap(), on Mac you use mmap().
The OS is working properly - it doesn't distinguish between stack and heap when swapping - it pages you whatever you seem not to be using and loads whatever you ask for.
There are a few things you could try:
consider whether myType can be made smaller - e.g. using int8_t or even width-appropriate bitfields instead of int, using pointers to pooled strings instead of worst-case-length character arrays, use offsets into arrays where they're smaller than pointers etc.. If you show us the type maybe we can suggest things.
think about your paging - if you have many objects on one memory page (likely 4k) they will need to stay in memory if any one of them is being used, so try to get objects that will be used around the same time onto the same memory page - this may involve hashing to small arrays of related myType objects, or even moving all your data into a packed array if possible (binary searching can be pretty quick anyway). Naively used hash tables tend to flay memory because similar objects are put in completely unrelated buckets.
serialisation/deserialisation with compression is a possibility: instead of letting the OS swap out full myType memory, you may be able to proactively serialise them into a more compact form then deserialise them only when needed
consider whether you need to process all the data simultaneously... if you can batch up the work in such a way that you get all "group A" out of the way using less memory then you can move on to "group B"
UPDATE now you've posted your actual data types...
Sadly, using short might not help much because sizeof key needs to be 16 anyway for alignment of the double; if you don't need the precision, you could consider float? Another option would be to create an array of separate maps...
tbb::concurrent_hash_map<double,value> map[65536];
You can then index to map[my_short][my_double]. It could be better or worse, but is easy to try so you might as well benchmark....
For cl_I a 2-minute dig suggests the data's stored in a union - presumably word is used for small values and one of the pointers when necessary... that looks like a pretty good design - hard to improve on.
If numbers tend to repeat a lot (a big if) you could experiment with e.g. keeping a registry of big cl_Is with a bi-directional mapping to packed integer ids which you'd store in My_Map::map - fussy though. To explain, say you get 987123498723489 - you push_back it on a vector<cl_I>, then in a hash_map<cl_I, int> set [987123498723489 to that index (i.e. vector.size() - 1). Keep going as new numbers are encountered. You can always map from an int id back to a cl_I using direct indexing in the vector, and the other way is an O(1) amortised hash table lookup.

mmap-loadable data structure library for C++ (or C)

I have a some large data structure (N > 10,000) that usually only needs to be created once (at runtime), and can be reused many times afterwards, but it needs to be loaded very quickly. (It is used for user input processing on iPhoneOS.) mmap-ing a file seems to be the best choice.
Are there any data structure libraries for C++ (or C)? Something along the line
ReadOnlyHashTable<char, int> table ("filename.hash");
// mmap(...) inside the c'tor
...
int freq = table.get('a');
...
// munmap(...); inside the d'tor.
Thank you!
Details:
I've written a similar class for hash table myself but I find it pretty hard to maintain, so I would like to see if there's existing solutions already. The library should
Contain a creation routine that serialize the data structure into file. This part doesn't need to be fast.
Contain a loading routine that mmap a file into read-only (or read-write) data structure that can be usable within O(1) steps of processing.
Use O(N) amount of disk/memory space with a small constant factor. (The device has serious memory constraint.)
Small time overhead to accessors. (i.e. the complexity isn't modified.)
Assumptions:
Bit representation of data (e.g. endianness, encoding of float, etc.) does not matter since it is only used locally.
So far the possible types of data I need are integers, strings, and struct's of them. Pointers do not appear.
P.S. Can Boost.intrusive help?
You could try to create a memory mapped file and then create the STL map structure with a customer allocator. Your customer allocator then simply takes the beginning of the memory of the memory mapped file, and then increments its pointer according to the requested size.
In the end all the allocated memory should be within the memory of the memory mapped file and should be reloadable later.
You will have to check if memory is free'd by the STL map. If it is, your customer allocator will lose some memory of the memory mapped file but if this is limited you can probably live with it.
Sounds like maybe you could use one of the "perfect hash" utilities out there. These spend some time opimising the hash function for the particular data, so there are no hash collisions and (for minimal perfect hash functions) so that there are no (or at least few) empty gaps in the hash table. Obviously, this is intended to be generated rarely but used frequently.
CMPH claims to cope with large numbers of keys. However, I have never used it.
There's a good chance it only generates the hash function, leaving you to use that to generate the data structure. That shouldn't be especially hard, but it possibly still leaves you where you are now - maintaining at least some of the code yourself.
Just thought of another option - Datadraw. Again, I haven't used this, so no guarantees, but it does claim to be a fast persistent database code generator.
WRT boost.intrusive, I've just been having a look. It's interesting. And annoying, as it makes one of my own libraries look a bit pointless.
I thought this section looked particularly relevant.
If you can use "smart pointers" for links, presumably the smart pointer type can be implemented using a simple offset-from-base-address integer (and I think that's the point of the example). An array subscript might be equally valid.
There's certainly unordered set/multiset support (C++ code for hash tables).
Using cmph would work. It does have the serialization machinery for the hash function itself, but you still need to serialize the keys and the data, besides adding a layer of collision resolution on top of it if your query set universe is not known before hand. If you know all keys before hand, then it is the way to go since you don't need to store the keys and will save a lot of space. If not, for such a small set, I would say it is overkill.
Probably the best option is to use google's sparse_hash_map. It has very low overhead and also has the serialization hooks that you need.
http://google-sparsehash.googlecode.com/svn/trunk/doc/sparse_hash_map.html#io
GVDB (GVariant Database), the core of Dconf is exactly this.
See git.gnome.org/browse/gvdb, dconf and bv
and developer.gnome.org/glib/2.30/glib-GVariant.html