Will changing my array type to Int16 save storage? - c++

This is the first time I'm working with large data sets in C++. Before, I'd just store whatever I want to in arrays. Because I'm generating heightmaps of 4000x4000 now, I want to be a bit more memory efficient, before my program is suddenly gulping up 100 MB (a little exorbitant for an indie fellow like me).
I'm going to load most of the data in chunks, but because my program needs a large amount of the data during execution, I'll still end up with large 2D-arrays in use. I want to find a way to make them occupy very little memory.
My heightmap is designed so that it only takes on small integer values; possibly small enough to fit into an Int8 but at the very least small enough for a Int16. So far, I've been using Int types, which on my implementation are of the Int32 variety.
If I were to switch my 2D-arrays and vectors to sets of the Int16 type, would this save me half the storage? Or would one element of the array still take one full byte and simply leave those untouched bits at zero?

If you are talking about the fixed-width integer types, then yes int16_t is guaranteed to be 16 bits without padding bits. You may also verify this yourself:
std::cout << sizeof(int16_t[1000]) << " " << sizeof(int32_t[1000]);

Switching to Int16 (assuming this is the same as int16_t) from Int32 (assuming this is the same as int32_t) would bring down the memory consumption that stems from your array by half (not considering any bookkeeping that goes along with the possibly dynamically sized array)
Also int16_t takes 2 bytes, and int32_t takes 4 bytes. The 16_t implies 16 bits.
There are no untouched bits, only those which you chose to not use at runtime. Unlike some pointer types, there are no unused bits with integers. 0s and 1s give identity to the integer's value

Related

avoiding array index promotion in hopes of better performance

I have a huge array of integers and those integers are not greater than 0xFFFF. Therefore I would like save some space and store them as unsigned short.
unsigned short addresses[50000 /* big number over here */];
Later I would use this array as follows
data[addresses[i]];
When I use only 2 bytes to store my integers, they are being promoted to either 4 or 8 bytes (depending on architecture) when used as array indices. Speed is very important to me, therefore should I rather store my integers as unsigned int to avoid wasting time on type promotion? This array may get absolutely massive and I would also like to save some space, but not at the cost of performance. What should I do?
EDIT: If I was to use address-type for my integers, which type should I use? size_t, maybe something else?
Anything to do with C style arrays usually gets compiled in to machine instructions that use the memory addressing of the architecture for which you compile, thus trying to save space on array indexes will not work.
If anything, you might break whatever optimizations your compiler might want to implement.
5 Million integer values, even on a 64bit machine, comes to about 40 MB RAM.
While I am sure your code does other things, this is not that much memory to sacrifice performance.
Since you chose to keep all those values in RAM in the first place, presumably for speed, don't ruin it.

Memory alignment, structs and malloc

It's a bit hard to formulate what I want to know in a single question, so I'll try to break it down.
For example purposes, let's say we have the following struct:
struct X {
uint8_t a;
uint16_t b;
uint32_t c;
};
Is it true that the compiler is guaranteed to never rearrange the order of X's members, only add padding where necessary? In other words, is it always true that offsetof(X, a) < offsetof(X, c)?
Is it true that the compiler will pick the largest alignment among X's members and use it to align objects of type X (i.e. addresses of X instances will be divisible by the largest alignment among X's members)?
Since malloc does not know anything about the type of objects that we're going to store when we allocate a buffer, how does it choose an alignment for the returned address? Does it simply return an adress that is divisible by the largest alignment possible (in which case, no matter what structure we put in the buffer, the memory accesses will always be aligned)?
Yes
No, the compiler will use its knowledge of the target host hardware to select the best alignment.
See question 2.
Since malloc does not know anything about the type of objects that we're going to store when we allocate a buffer, how does it choose an alignment for the returned address?
malloc(3) returns "memory that is suitably aligned for any kind of variable."
Does it simply return an adress that is divisible by the largest alignment possible (in which case, no matter what structure we put in the buffer, the memory accesses will always be aligned)?
Yes, but watch your compliance with the strict aliasing rule.
The compiler will do whatever is most beneficial on that computer in the largest number of circumstances. On most platforms loading bus wide values on bus width offsets is fastest.
That means that generally on 32-bit computers compilers will chose to align 32-bit numbers on 4 byte offsets. On 64-bit computers 64-bit values are aligned on 8 byte offsets.
On most computers smaller values like 8-bit and 16-bit values are slower to load. Probably all 4 or 8 bytes around it are loaded and the byte or two bytes you need are masked off.
When you have special circumstances you can override the compiler by specifying the alignment and the padding. You might do this when you know fast loading is not important, but you really want to pack the data tightly. Or when you are playing very subtle tricks with casting and unions.
Memory allocation routines on almost any modern computer will always return memory that is aligned on at least the bus width of the platform ( e.g. 4 or 8 bytes ) - or even more - like 16 byte alignment.
When you call "malloc" you are responsible for knowing the size of the structures you need. Luckily the compiler will tell you the size of any structure with "sizeof". That means that if you pack a structure to save memory, sizeof will return a smaller value than an unpacked structure. So you really will save memory - if you are allocating small structures in large arrays of them.
If you allocate small packed structures one at a time - then yes - if you pack them or not it won't make any difference. That is because when you allocate some odd small piece of memory - the allocator will actually use significantly more memory than that. It will allocate an convenient sized block of memory for you, and then an additional block of memory for itself to keep track of you allocation.
Which is why if you care about memory use and want to pack your structures - you definitely don't want to allocate them one at time.

Casting size_t to allow more elements in a std::vector

I need to store a huge number of elements in a std::vector (more that the 2^32-1 allowed by unsigned int) in 32 bits. As far as I know this quantity is limited by the std::size_t unsigned int type. May I change this std::size_t by casting to an unsigned long? Would it resolve the problem?
If that's not possible, suppose I compile in 64 bits. Would that solve the problem without any modification?
size_t is a type that can hold size of any allocable chunk of memory. It follows that you can't allocate more memory than what fits in your size_t and thus can't store more elements in any way.
Compiling in 64-bits will allow it, but realize that the array still needs to fit in memory. 232 is 4 billion, so you are going to go over 4 * sizeof(element) GiB of memory. More than 8 GiB of RAM is still rare, so that does not look reasonable.
I suggest replacing the vector with the one from STXXL. It uses external storage, so your vector is not limited by amount of RAM. The library claims to handle terabytes of data easily.
(edit) Pedantic note: size_t needs to hold size of maximal single object, not necessarily size of all available memory. In segmented memory models it only needs to accommodate the offset when each object has to live in single segment, but with different segments more memory may be accessible. It is even possible to use it on x86 with PAE, the "long" memory model. However I've not seen anybody actually use it.
There are a number of things to say.
First, about the size of std::size_t on 32-bit systems and 64-bit systems, respectively. This is what the standard says about std::size_t (§18.2/6,7):
6 The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
7 [ Note: It is recommended that implementations choose types for ptrdiff_t and size_t whose integer
conversion ranks (4.13) are no greater than that of signed long int unless a larger size is necessary to
contain all the possible values. — end note ]
From this it follows that std::size_t will be at least 32 bits in size on a 32-bit system, and at least 64 bits on a 64-bit system. It could be larger, but that would obviously not make any sense.
Second, about the idea of type casting: For this to work, even in theory, you would have to cast (or rather: redefine) the type inside the implementation of std::vector itself, wherever it occurs.
Third, when you say you need this super-large vector "in 32 bits", does that mean you want to use it on a 32-bit system? In that case, as the others have pointed out already, what you want is impossible, because a 32-bit system simply doesn't have that much memory.
But, fourth, if what you want is to run your program on a 64-bit machine, and use only a 32-bit data type to refer to the number of elements, but possibly a 64-bit type to refer to the total size in bytes, then std::size_t is not relevant because that is used to refer to the total number of elements, and the index of individual elements, but not the size in bytes.
Finally, if you are on a 64-bit system and want to use something of extreme proportions that works like a std::vector, that is certainly possible. Systems with 32 GB, 64 GB, or even 1 TB of main memory are perhaps not extremely common, but definitely available.
However, to implement such a data type, it is generally not a good idea to simply allocate gigabytes of memory in one contiguous block (which is what a std::vector does), because of reasons like the following:
Unless the total size of the vector is determined once and for all at initialization time, the vector will be resized, and quite likely re-allocated, possibly many times as you add elements. Re-allocating an extremely large vector can be a time-consuming operation. [ I have added this item as an edit to my original answer. ]
The OS will have difficulties providing such a large portion of unfragmented memory, as other processes running in parallel require memory, too. [Edit: As correctly pointed out in the comments, this isn't really an issue on any standard OS in use today.]
On very large servers you also have tens of CPUs and typically NUMA-type memory architectures, where it is clearly preferable to work with relatively smaller chunks of memory, and have multiple threads (possibly each running on a different core) access various chunks of the vector in parallel.
Conclusions
A) If you are on a 32-bit system and want to use a vector that large, using disk-based methods such as the one suggested by #JanHudec is the only thing that is feasible.
B) If you have access to a large 64-bit system with tens or hundreds of GB, you should look into an implementation that divides the entire memory area into chunks. Essentially something that works like a std::vector<std::vector<T>>, where each nested vector represents one chunk. If all chunks are full, you append a new chunk, etc. It is straight-forward to implement an iterator type for this, too. Of course, if you want to optimize this further to take advantage of multi-threading and NUMA features, it will get increasingly complex, but that is unavoidable.
A vector might be the wrong data structure for you. It requires storage in a single block of memory, which is limited by the size of size_t. This you can increase by compiling for 64 bit systems, but then you can't run on 32 bit systems which might be a requirement.
If you don't need vector's particular characteristics (particularly O(1) lookup and contiguous memory layout), another structure such as a std::list might suit you, which has no size limits except what the computer can physically handle as it's a linked list instead of a conveniently-wrapped array.

Why use array of char instead of int for bitsets in C++

For a project I'm working on I need to create my own implementation for a bitset. I've taken a look at the STL library to see how they handle this and looked at a few other things online. It seems like it's pretty standard to use a char array. Is there a reason why everyone uses char arrays instead of the integer type?
Simply because a char in C++ is a single byte, (or at least, it's guaranteed by the C++ standard to be less than or equal in size to int or short) whereas the size of an int is usually larger than a byte. (It's usually 32-bit, or 4-bytes, on most machines these days.) Since a single byte is the smallest addressable unit of data a computer can process, it's natural to use arrays of chars when working with individual bits. If you used int, for example, then you would waste significant space for any number of bits that is not a multiple of sizeof(int), but with a byte array you waste the least amount of space possible.
Char is (usually) the smallest unit of bits that a microprocessor can manipulate. If you're creating an object that works with arbitrary numbers of bits, it makes sense to use an array of the smallest unit. That way you always use the fewest units possible.
If you need a non-arbitrary-sized bitset and the processor has a native type large enough to contain it, use an N-bit type. It will be more efficient than an array.

C++ 2.5 bytes (20-bit) integer

I know it's ridiculous, but I need it for storage optimization. Is there any good way to implement it in C++?
It has to be flexible enough so that I can use it as a normal data type e.g Vector< int20 >, operator overloading, etc..
If storage is your main concern, I suspect you need quite a few 20-bit variables. How about storing them in pairs? You could create a class representing two such variables and store them in 2.5+2.5 = 5 bytes.
To access the variables conveniently you could override the []-operator so you could write:
int fst = pair[0];
int snd = pair[1];
Since you may want to allow for manipulations such as
pair[1] += 5;
you would not want to return a copy of the backing bytes, but a reference. However, you can't return a direct reference to the backing bytes (since it would mess up it's neighboring value), so you'd actually need to return a proxy for the backing bytes (which in turn has a reference to the backing bytes) and let the proxy overload the relevant operators.
As a metter of fact, as #Tony suggest, you could generalize this to have a general container holding N such 20-bit variables.
(I've done this myself in a specialization of a vector for efficient storage of booleans (as single bits).)
No... you can't do that as a single value-semantic type... any class data must be a multiple of the 8-bit character size (inviting all the usual quips about CHAR_BITS etc).
That said, let's clutch at straws...
Unfortunately, you're obviously handling very many data items. If this is more than 64k, any proxy object into a custom container of packed values will probably need a >16 bit index/handle too, but still one of the few possibilities I can see worth further consideration. It might be suitable if you're only actively working with and needing value semantic behaviour for a small subset of the values at one point in time.
struct Proxy
{
Int20_Container& container_; // might not need if a singleton
Int20_Container::size_type index_;
...
};
So, the proxy might be 32, 64 or more bits - the potential benefit is only if you can create them on the fly from indices into the container, have them write directly back into the container, and keep them short-lived with few concurrently. (One simple way - not necessarily the fastest - to implement this model is to use an STL bitset or vector as the Int20_Container, and either store 20 times the logical index in index_, or multiply on the fly.)
It's also vaguely possible that although your values range over a 20-bit space, you've less than say 64k distinct values in actual use. If you have some such insight into your data set, you can create a lookup table where 16-bit array indices map to 20-bit values.
Use a class. As long as you respect the copy/assign/clone/etc... STL semantics, you won't have any problem.
But it will not optimize the memory space on your computer. Especially if you put in in a flat array, the 20bit will likely be aligned on a 32bit boundary, so the benefit of a 20bit type there is useless.
In that case, you will need to define your own optimized array type, that could be compatible with the STL. But don't expect it to be fast. It won't be.
Use a bitfield. (I'm really surprised nobody has suggested this.)
struct int20_and_something_else {
int less_than_a_million : 20;
int less_than_four_thousand : 12; // total 32 bits
};
This only works as a mutual optimization of elements in a structure, where you can spackle the gaps with some other data. But it works very well!
If you truly need to optimize a gigantic array of 20-bit numbers and nothing else, there is:
struct int20_x3 {
int one : 20;
int two : 20;
int three : 20; // 60 bits is almost 64
void set( int index, int value );
int get( int index );
};
You can add getter/setter functions to make it prettier if you like, but you can't take the address of a bitfield, and they can't participate in an array. (Of course, you can have an array of the struct.)
Use as:
int20_x3 *big_array = new int20_x3[ array_size / 3 + 1 ];
big_array[ index / 3 ].set( index % 3, value );
You can use C++ std::bitset. Store everything in a bitset and access your data using the correct index (x20).
Your not going to be able to get exactly 20 bits as a type(even with a bit packed struct), as it will always be aligned (at smallest grainularity) to a byte. Imo the only way to go, if you must have 20 bits, is to create a bitstream to handle the data(which you can overload to accept indexing etc)
You can use the union keyword to create a bit field. I've used it way back when bit fields were a necessity. Otherwise, you can create a class that holds 3 bytes, but through bitwise operations exposes just the most significant 20.
As far as I know that isn't possible.
The easiest option would be to define a custom type, that uses an int32_t as the backing storage, and implements appropriate maths as override operators.
For better storage density, you could store 3 int20 in a single int64_t value.
Just an idea: use optimized storage (5 bytes for two instances), and for operations, convert it into 32-bit int and then back.
While its possible to do this a number of ways.
One possibilty would be to use bit twidling to store them as the left and right parts of a 5 byte array with a class to store/retrieve which converts yoiur desired array entry to an array entry in byte5[] array and extracts the left ot right half as appropriate.
However on most hardware requires integers to be word aligned so as well as the bit twiddling to extract the integer you would need some bit shifiting to align it properly.
I think it would be more efficient to increase your swap space and let virtual memory take care of your large array (after all 20 vs 32 is not much of a saving!) always assuming you have a 64 bit OS.