c++ boost::multi_array index too large

c++ boost::multi_array index too large - c++

I'm using a two-dimensional boost::multi_array to store objects of a custom struct. The problem is that I have a huge amount of these objects so that the index of the array I would need exceeds the range of an integer. Is there any possibility to use long as an index of a multi-array or do you have any other suggestions on how to store a dataset this big and still keep it accessible at a decent speed?
Thanks!

The official documentation states that the index type is unspecified, but looking into the repository, one sees that the definition most likely is typedef std::ptrdiff_t index;
So if you compile for an x86 32-bit system, you will surely run out of addressable memory anyways, so the limited size of indicies is not your real problem. Your only option would be to chose a system with enough memory, which has to be one with more than 2^32 bytes and thus has to be a 64 bit one. 2^64 will be certainly enough to represent the dimensions of your multiarray.

Related

avoiding array index promotion in hopes of better performance

I have a huge array of integers and those integers are not greater than 0xFFFF. Therefore I would like save some space and store them as unsigned short.
unsigned short addresses[50000 /* big number over here */];
Later I would use this array as follows
data[addresses[i]];
When I use only 2 bytes to store my integers, they are being promoted to either 4 or 8 bytes (depending on architecture) when used as array indices. Speed is very important to me, therefore should I rather store my integers as unsigned int to avoid wasting time on type promotion? This array may get absolutely massive and I would also like to save some space, but not at the cost of performance. What should I do?
EDIT: If I was to use address-type for my integers, which type should I use? size_t, maybe something else?

Anything to do with C style arrays usually gets compiled in to machine instructions that use the memory addressing of the architecture for which you compile, thus trying to save space on array indexes will not work.
If anything, you might break whatever optimizations your compiler might want to implement.
5 Million integer values, even on a 64bit machine, comes to about 40 MB RAM.
While I am sure your code does other things, this is not that much memory to sacrifice performance.
Since you chose to keep all those values in RAM in the first place, presumably for speed, don't ruin it.

How to avoid wasting memory on 64-bit pointers

I'm hoping for some high-level advice on how to approach a design I'm about to undertake.
The straightforward approach to my problem will result in millions and millions of pointers. On a 64-bit system these will presumably be 64-bit pointers. But as far as my application is concerned, I don't think I need more than a 32-bit address space. I would still like for the system to take advantage of 64-bit processor arithmetic, however (assuming that is what I get by running on a 64-bit system).
Further background
I'm implementing a tree-like data structure where each "node" contains an 8-byte payload, but also needs pointers to four neighboring nodes (parent, left-child, middle-child, right-child). On a 64-bit system using 64-bit pointers, this amounts to 32 bytes just for linking an 8-byte payload into the tree -- a "linking overhead" of 400%.
The data structure will contain millions of these nodes, but my application will not need much memory beyond that, so all these 64-bit pointers seem wasteful. What to do? Is there a way to use 32-bit pointers on a 64-bit system?
I've considered
Storing the payloads in an array in a way such that an index implies (and is implied by) a "tree address" and neighbors of a given index can be calculated with simple arithmetic on that index. Unfortunately this requires me to size the array according to the maximum depth of the tree, which I don't know beforehand, and it would probably incur even greater memory overhead due to empty node elements in the lower levels because not all branches of the tree go to the same depth.
Storing nodes in an array large enough to hold them all, and then using indices instead of pointers to link neighbors. AFAIK the main disadvantage here would be that each node would need the array's base address in order to find its neighbors. So they either need to store it (a million times over) or it needs to be passed around with every function call. I don't like this.
Assuming that the most-significant 32 bits of all these pointers are zero, throwing an exception if they aren't, and storing only the least-significant 32 bits. So the required pointer can be reconstructed on demand. The system is likely to use more than 4GB, but the process will never. I'm just assuming that pointers are offset from a process base-address and have no idea how safe (if at all) this would be across the common platforms (Windows, Linux, OSX).
Storing the difference between 64-bit this and the 64-bit pointer to the neighbor, assuming that this difference will be within the range of int32_t (and throwing if it isn't). Then any node can find it's neighbors by adding that offset to this.
Any advice? Regarding that last idea (which I currently feel is my best candidate) can I assume that in a process that uses less than 2GB, dynamically allocated objects will be within 2 GB of each other? Or not at all necessarily?

Combining ideas 2 and 4 from the question, put all the nodes into a big array, and store e.g. int32_t neighborOffset = neighborIndex - thisIndex. Then you can get the neighbor from *(this+neighborOffset). This gets rid of the disadvantages/assumptions of both 2 and 4.

If on Linux, you might consider using (and compiling for) the x32 ABI. IMHO, this is the preferred solution for your issues.
Alternatively, don't use pointers, but indexes into a huge array (or an std::vector in C++) which could be a global or static variable. Manage a single huge heap-allocated array of nodes, and use indexes of nodes instead of pointers to nodes. So like your §2, but since the array is a global or static data you won't need to pass it everywhere.
(I guess that an optimizing compiler would be able to generate clever code, which could be nearly as efficient as using pointers)

You can remove the disadvantage of (2) by exploiting the alignment of memory regions to find the base address of the the array "automatically". For example, if you want to support up to 4 GB of nodes, ensure your node array starts at a 4GB boundary.
Then within a node with address addr, you can determine the address of another at index as addr & -(1UL << 32) + index.
This is kind of the "absolute" variant of the accepted solution which is "relative". One advantage of this solution is that an index always has the same meaning within a tree, whereas in the relative solution you really need the (node_address, index) pair to interpret an index (of course, you can also use the absolute indexes in the relative scenarios where it is useful). It means that when you duplicate a node, you don't need to adjust any index values it contains.
The "relative" solution also loses 1 effective index bit relative to this solution in its index since it needs to store a signed offset, so with a 32-bit index, you could only support 2^31 nodes (assuming full compression of trailing zero bits, otherwise it is only 2^31 bytes of nodes).
You can also store the base tree structure (e.g,. the pointer to the root and whatever bookkeeping your have outside of the nodes themselves) right at the 4GB address which means that any node can jump to the associated base structure without traversing all the parent pointers or whatever.
Finally, you can also exploit this alignment idea within the tree itself to "implicitly" store other pointers. For example, perhaps the parent node is stored at an N-byte aligned boundary, and then all children are stored in the same N-byte block so they know their parent "implicitly". How feasible that is depends on how dynamic your tree is, how much the fan-out varies, etc.
You can accomplish this kind of thing by writing your own allocator that uses mmap to allocate suitably aligned blocks (usually just reserve a huge amount of virtual address space and then allocate blocks of it as needed) - ether via the hint parameter or just by reserving a big enough region that you are guaranteed to get the alignment you want somewhere in the region. The need to mess around with allocators is the primary disadvantage compared to the accepted solution, but if this is the main data structure in your program it might be worth it. When you control the allocator you have other advantages too: if you know all your nodes are allocated on an 2^N-byte boundary you can "compress" your indexes further since you know the low N bits will always be zero, so with a 32-bit index you could actually store 2^(32+5) = 2^37 nodes if you knew they were 32-byte aligned.
These kind of tricks are really only feasible in 64-bit programs, with the huge amount of virtual address space available, so in a way 64-bit giveth and also taketh away.

Your assertion that a 64 bit system necessarily has to have 64 bit pointers is not correct. The C++ standard makes no such assertion.
In fact, different pointer types can be different sizes: sizeof(double*) might not be the same as sizeof(int*).
Short answer: don't make any assumptions about the sizes of any C++ pointer.
Sounds like to me that you want to build you own memory management framework.

std::vector to hold much more than 2^32 elements

std::vector::size returns a size_t so I guess it can hold up to 2^32 elements.
Is there a standard container than can hold much more elements, e.g. 2^64 OR a way to tweak std::vector to be "indexed" by e.g. a unsigned long long?

Sure. Compile a 64-bit program. size_t will be 64 bits wide then.
But really, what you should be doing is take a step back, and consider why you need such a big vector. Because most likely, you don't, and there's a better way to solve whatever problem you're working on.

size_t doesn't have a predefined size, although it is often capped at 232 on 32 bit computers.
Since a std::vector must hold contiguous memory for all elements, you will run out of memory before exceeding the size.
Compile your program for a 64 bit computer and you'll have more space.
Better still, reconsider if std::vector is appropriate. Why do you want to hold trillions of adjacent objects directly in memory?
Consider a std::map<unsigned long long, YourData> if you only want large indexes and aren't really trying to store trillions of objects.

Casting size_t to allow more elements in a std::vector

I need to store a huge number of elements in a std::vector (more that the 2^32-1 allowed by unsigned int) in 32 bits. As far as I know this quantity is limited by the std::size_t unsigned int type. May I change this std::size_t by casting to an unsigned long? Would it resolve the problem?
If that's not possible, suppose I compile in 64 bits. Would that solve the problem without any modification?

size_t is a type that can hold size of any allocable chunk of memory. It follows that you can't allocate more memory than what fits in your size_t and thus can't store more elements in any way.
Compiling in 64-bits will allow it, but realize that the array still needs to fit in memory. 232 is 4 billion, so you are going to go over 4 * sizeof(element) GiB of memory. More than 8 GiB of RAM is still rare, so that does not look reasonable.
I suggest replacing the vector with the one from STXXL. It uses external storage, so your vector is not limited by amount of RAM. The library claims to handle terabytes of data easily.
(edit) Pedantic note: size_t needs to hold size of maximal single object, not necessarily size of all available memory. In segmented memory models it only needs to accommodate the offset when each object has to live in single segment, but with different segments more memory may be accessible. It is even possible to use it on x86 with PAE, the "long" memory model. However I've not seen anybody actually use it.

There are a number of things to say.
First, about the size of std::size_t on 32-bit systems and 64-bit systems, respectively. This is what the standard says about std::size_t (§18.2/6,7):
6 The type size_t is an implementation-deﬁned unsigned integer type that is large enough to contain the size
in bytes of any object.
7 [ Note: It is recommended that implementations choose types for ptrdiff_t and size_t whose integer
conversion ranks (4.13) are no greater than that of signed long int unless a larger size is necessary to
contain all the possible values. — end note ]
From this it follows that std::size_t will be at least 32 bits in size on a 32-bit system, and at least 64 bits on a 64-bit system. It could be larger, but that would obviously not make any sense.
Second, about the idea of type casting: For this to work, even in theory, you would have to cast (or rather: redefine) the type inside the implementation of std::vector itself, wherever it occurs.
Third, when you say you need this super-large vector "in 32 bits", does that mean you want to use it on a 32-bit system? In that case, as the others have pointed out already, what you want is impossible, because a 32-bit system simply doesn't have that much memory.
But, fourth, if what you want is to run your program on a 64-bit machine, and use only a 32-bit data type to refer to the number of elements, but possibly a 64-bit type to refer to the total size in bytes, then std::size_t is not relevant because that is used to refer to the total number of elements, and the index of individual elements, but not the size in bytes.
Finally, if you are on a 64-bit system and want to use something of extreme proportions that works like a std::vector, that is certainly possible. Systems with 32 GB, 64 GB, or even 1 TB of main memory are perhaps not extremely common, but definitely available.
However, to implement such a data type, it is generally not a good idea to simply allocate gigabytes of memory in one contiguous block (which is what a std::vector does), because of reasons like the following:
Unless the total size of the vector is determined once and for all at initialization time, the vector will be resized, and quite likely re-allocated, possibly many times as you add elements. Re-allocating an extremely large vector can be a time-consuming operation. [ I have added this item as an edit to my original answer. ]
The OS will have difficulties providing such a large portion of unfragmented memory, as other processes running in parallel require memory, too. [Edit: As correctly pointed out in the comments, this isn't really an issue on any standard OS in use today.]
On very large servers you also have tens of CPUs and typically NUMA-type memory architectures, where it is clearly preferable to work with relatively smaller chunks of memory, and have multiple threads (possibly each running on a different core) access various chunks of the vector in parallel.
Conclusions
A) If you are on a 32-bit system and want to use a vector that large, using disk-based methods such as the one suggested by #JanHudec is the only thing that is feasible.
B) If you have access to a large 64-bit system with tens or hundreds of GB, you should look into an implementation that divides the entire memory area into chunks. Essentially something that works like a std::vector<std::vector<T>>, where each nested vector represents one chunk. If all chunks are full, you append a new chunk, etc. It is straight-forward to implement an iterator type for this, too. Of course, if you want to optimize this further to take advantage of multi-threading and NUMA features, it will get increasingly complex, but that is unavoidable.

A vector might be the wrong data structure for you. It requires storage in a single block of memory, which is limited by the size of size_t. This you can increase by compiling for 64 bit systems, but then you can't run on 32 bit systems which might be a requirement.
If you don't need vector's particular characteristics (particularly O(1) lookup and contiguous memory layout), another structure such as a std::list might suit you, which has no size limits except what the computer can physically handle as it's a linked list instead of a conveniently-wrapped array.

Declaring large character array in c++

I am trying right now to declare a large character array. I am using the character array as a bitmap (as in a map of booleans, not the image file type). The following code generates a compilation error.
//This is code before main. I want these as globals.
unsigned const long bitmap_size = (ULONG_MAX/(sizeof(char)));
char bitmap[bitmap_size];
The error is overflow in array dimension. I recognize that I'm trying to have my process consume a lot of data and that there might be some limit in place that prevents me from doing so. I am curious as to whether I am making a syntax error or if I need to request more resources from the kernel. Also, I have no interest in creating a bitmap with some class. Thank you for your time.
EDIT
ULONG_MAX is very much dependent upon the machine that you are using. On the particular machine I was compiling my code on it was well over 4.2 billion. All in all, I wouldn't to use that constant like a constant, at least for the purpose of memory allocation.

ULONG_MAX/sizeof(char) is the same as ULONG_MAX, which is a very large number. So large, in fact, that you don't have room for it even in virtual memory (because ULONG_MAX is probably the number of bytes in your entire virtual memory).
You definitely need to rethink what you are trying to do.

It's impossible to declare an array that large on most systems -- on a 32-bit system, that array is 4 GB, which doesn't fit into the available address space, and on most 64-bit systems, it's 16 exabytes (16 million terabytes), which doesn't fit into the available address space there either (and, incidentally, may be more memory than exists on the entire planet).
Use malloc() to allocate large amounts of memory. But be realistic. :)

As I understand it, the maximum size of an array in c++ is the largest integer the platform supports. It is likely that your long-type bitmap_size constant exceeds that limit.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js