Unions between pointers and data, possible pitfalls? - c++

I'm programming a system which has a massive amount of redundant data that needs to be kept in memory, and accessible with as little latency as possible. (uncompressed, the data is guaranteed to absorb 1GB of memory, minimum).
One such method I thought of is creating a container class like the following:
class Chunk{
public:
Chunk(){ ... };
~Chunk() { /*carefully delete elements according to mask*/ };
getElement(int index);
setElement(int index);
private:
unsigned char mask; // on bit == data is not-redundant, array is 8x8, 64 elements
union{
Uint32 redundant; // all 8 elements are this value if mask bit == 0
Uint32 * ptr; // pointer to 8 allocated elements if mask bit == 1
}array[8];
};
My question, is that is there any unseen consequences of using a union to shift between a Uint32 primative, and a Uint32* pointer?

This approach should be safe on all C++ implementations.
Note, however, that if you know your platform's memory alignment requirements, you may be able to do better than this. In particular, if you know that memory allocations are aligned to 2 bytes or greater (many platforms use 8 or 16 bytes), you can use the lower bit of the pointer as a flag:
class Chunk {
//...
uintptr_t ptr;
};
// In your get function:
if ( (ptr & 1) == 0 ) {
return ((uint32_t *)ptr)[index];
} else {
return *((uint32_t *)(ptr & ~(uintptr_t)0);
}
You can further reduce space usage by using a custom allocation method (with placement new) and placing the pointer immediately after the class, in a single memory allocation (ie, you'll allocate room for Chunk and either the mask or the array, and have ptr point immediately after Chunk). Or, if you know most of your data will have the low bit off, you can use the ptr field directly as the fill-in value:
} else {
return ptr & ~(uintptr_t)0;
}
If it's the high bit that's usually unused, a bit of bit shifting will work:
} else {
return ptr >> 1;
}
Note that this approach of tagging pointers is unportable. It is only safe if you can ensure your memory allocations will be properly aligned. On most desktop OSes, this will not be a problem - malloc already ensures some degree of alignment; on Unixes, you can be absolutely sure by using posix_memalign. If you can obtain such a guarentee for your platform, though, this approach can be quite effective.

If space is at a premium you may be wasting memory. It will allocate enough space for the largest element, which in this case could be up to be 64 bits for the pointer.
If you stick to 32-bit architectures you should not have problems with the cast.

Related

C++11 : Does new return contiguous memory?

float* tempBuf = new float[maxVoices]();
Will the above result in
1) memory that is 16-byte aligned?
2) memory that is confirmed to be contiguous?
What I want is the following:
float tempBuf[maxVoices] __attribute__ ((aligned));
but as heap memory, that will be effective for Apple Accelerate framework.
Thanks.
The memory will be aligned for float, but not necessarily for CPU-specific SIMD instructions. I strongly suspect on your system sizeof(float) < 16, though, which means it's not as aligned as you want. The memory will be contiguous: &A[i] == &A[0] + i.
If you need something more specific, new std::aligned_storage<Length, Alignment> will return a suitable region of memory, presuming of course that you did in fact pass a more specific alignment.
Another alternative is struct FourFloats alignas(16) {float[4] floats;}; - this may map more naturally to the framework. You'd now need to do new FourFloats[(maxVoices+3)/4].
Yes, new returns contiguous memory.
As for alignment, no such alignment guarantee is provided. Try this:
template<class T, size_t A>
T* over_aligned(size_t N){
static_assert(A <= alignof(std::max_align_t),
"Over-alignment is implementation-defined."
);
static_assert( std::is_trivially_destructible<T>{},
"Function does not store number of elements to destroy"
);
using Helper=std::aligned_storage_t<sizeof(T), A>;
auto* ptr = new Helper[(N+sizeof(Helper)-1)/sizeof(Helper)];
return new(ptr) T[N];
}
Use:
float* f = over_aligned<float,16>(37);
Makes an array of 37 floats, with the buffer aligned to 16 bytes. Or it fails to compile.
If the assert fails, it can still work. Test and consult your compiler documentation. Once convinced, put compiler-specific version guards around the static assert, so when you change compilers you can test all over again (yay).
If you want real portability, you have to have a fall back to std::align and manage resources separately from data pointers and count the number of T if and only if T has a non-trivial destructor, then store the number of T "before" the start of your buffer. It gets pretty silly.
It's guaranteed to be aligned properly with respect to the type you're allocating. So if it's an array of 4 floats (each supposedly 4 bytes), it's guaranteed to provide a usable sequence of floats. It's not guaranteed to be aligned to 16 bytes.
Yes, it's guaranteed to be contiguous (otherwise what would be the meaning of a single pointer?)
If you want it to be aligned to some K bytes, you can do it manually with std::align. See MSalter's answer for a more efficient way of doing this.
If tempBuf is not nullptr then the C++ standard guarantees that tempBuf points to the zeroth element of least maxVoices contiguous floats.
(Don't forget to call delete[] tempBuf once you're done with it.)

What's the purpose of align C++ pointer position

I am now reading the source code of OPENCV, a computer vision open source library. I am confused with this function:
#define CV_MALLOC_ALIGN 16
void* fastMalloc( size_t size )
{
uchar* udata = (uchar*)malloc(size + sizeof(void*) + CV_MALLOC_ALIGN);
if(!udata)
return OutOfMemoryError(size);
uchar** adata = alignPtr((uchar**)udata + 1, CV_MALLOC_ALIGN);
adata[-1] = udata;
return adata;
}
/*!
Aligns pointer by the certain number of bytes
This small inline function aligns the pointer by the certian number of bytes by
shifting it forward by 0 or a positive offset.
*/
template<typename _Tp> static inline _Tp* alignPtr(_Tp* ptr, int n=(int)sizeof(_Tp))
{
return (_Tp*)(((size_t)ptr + n-1) & -n);
}
fastMalloc is used to allocated memory for a pointer, which invoke malloc function and then alignPtr. I cannot understand well why alignPtr is called after memory is allocated? My basic understanding is by doing so it is much faster for the machine to find the pointer. Can some references on this issue be found in the internet? For modern computer, is it still necessary to perform this operation? Any ideas will be appreciated.
Some platforms require certain types of data to appear on certain byte boundaries (e.g:- some compilers
require pointers to be stored on 4-byte boundaries).
This is called alignment, and it calls for extra padding within, and possibly at the end of, the object's data.
Compiler might break in case they didn't find proper alignment OR there could be performance bottleneck in reading that data ( as there would be a need to read two blocks for getting same data).
EDITED IN RESPONSE TO COMMENT:-
Memory request by a program is generally handled by memory allocator. One such memory allocator is fixed-size allocator. Fixed size allocation return chunks of specified size even if requested memory is less than that particular size. So, with that background let me try to explain what's going on here:-
uchar* udata = (uchar*)malloc(size + sizeof(void*) + CV_MALLOC_ALIGN);
This would allocate amount of memory which is equal to memory_requested + random_size. Here random_size is filling up the gap to make it fit for size specified for fixed allocation scheme.
uchar** adata = alignPtr((uchar**)udata + 1, CV_MALLOC_ALIGN);
This is trying to align pointer to specific boundary as explained above.
It allocates a block a bit bigger than it was asked for.
Then it sets adata to the address of the next properly allocated byte (add one byte, then round up to the next properly aligned address).
Then it stores the original pointer before the new address. I assume this is later used to free the originally allocated block.
And then we return the new address.
This only makes sense if CV_MALLOC_ALIGN is a stricter alignment than malloc guarantees - perhaps a cache line?

How portable is using the low bit of a pointer as a flag?

If there is for example a class that requires a pointer and a bool. For simplicity an int pointer will be used in examples, but the pointer type is irrelevant as long as it points to something whose size() is more than 1 .
Defining the class with { bool , int *} data members will result in the class having a size that is double the size of the pointer and a lot of wasted space
If the pointer does not point to a char (or other data of size(1)), then presumably the low bit will always be zero. The class could defined with {int *} or for convenience: union { int *, uintptr_t }
The bool is implemented by setting/clearing the low bit of the pointer as per the logical bool value and clearing the bit when you need to use the pointer.
The defined way:
struct myData
{
int * ptr;
bool flag;
};
myData x;
// initialize
x.ptr = new int;
x.flag = false;
// set flag true
x.flag = true;
// set flag false
x.flag = false;
// use ptr
*(x.ptr)=7;
// change ptr
x = y; // y is another int *
And the proposed way:
union tiny
{
int * ptr;
uintptr_t flag;
};
tiny x;
// initialize
x.ptr = new int;
// set flag true
x.flag |= 1;
// set flag false
x.flag &= ~1;
// use ptr
tiny clean=x; // note that clean will likely be optimized out
clean.flag &= ~1; // back to original value as assigned to ptr
*(clean.ptr)=7;
// change ptr
bool flag=x.flag;
x.ptr = y; // y is another int *
x.flag |= flag;
This seems to be undefined behavior, but how portable is this?
As long as you restore the pointer's low-order bit before trying to use it as a pointer, it's likely to be "reasonably" portable, as long as your system, your C++ implementation, and your code meet certain assumptions.
I can't necessarily give you a complete list of assumptions, but off the top of my head:
It assumes you're not pointing to anything whose size is 1 byte. This excludes char, unsigned char, signed char, int8_t, and uint8_t. (And that assumes CHAR_BIT == 8; on exotic systems with, say, 16-bit or 32-bit bytes, other types might be excluded.)
It assumes objects whose size is at least 2 bytes are always aligned at an even address. Note that x86 doesn't require this; you can access a 4-byte int at an odd address, but it will be slightly slower. But compilers typically arrange for objects to be stored at even addresses. Other architectures may have different requirements.
It assumes a pointer to an even address has its low-order bit set to 0.
For that last assumption, I actually have a concrete counterexample. On Cray vector systems (J90, T90, and SV1 are the ones I've used myself) a machine address points to a 64-bit word, but the C compiler under Unicos sets CHAR_BIT == 8. Byte pointers are implemented in software, with the 3-bit byte offset within a word stored in the otherwise unused high-order 3 bits of the 64-bit pointer. So a pointer to an 8-byte aligned object could have easily its low-order bit set to 1.
There have been Lisp implementations (example) that use the low-order 2 bits of pointers to store a type tag. I vaguely recall this causing serious problems during porting.
Bottom line: You can probably get away with it for most systems. Future architectures are largely unpredictable, and I can easily imagine your scheme breaking on the next Big New Thing.
Some things to consider:
Can you store the boolean values in a bit vector outside your class? (Maintaining the association between your pointer and the corresponding bit in the bit vector is left as an exercise).
Consider adding code to all pointer operations that fails with an error message if it ever sees a pointer with its low-order bit set to 1. Use #ifdef to remove the checking code in your production version. If you start running into problems on some platform, build a version of your code with the checks enabled and see what happens.
I suspect that, as your application grows (they seldom shrink), you'll want to store more than just a bool along with your pointer. If that happens, the space issue goes away, because you're already using that extra space anyway.
In "theory": it's undefined behavior as far as I know.
In "reality": it'll work on everyday x86/x64 machines, and probably ARM too?
I can't really make a statement beyond that.
It's very portable, and furthermore, you can assert when you accept the raw pointer to make sure it meets the alignment requirement. This will insure against the unfathomable future compiler that somehow messes you up.
Only reasons not to do it are the readability cost and general maintenance associated with "hacky" stuff like that. I'd shy away from it unless there's a clear gain to be made. But it is sometimes totally worth it.
Conform to those rules and it should be very portable.

memory usage of in class - converting double to float didn't reduce memory usage as expected

I am initializing millions of classes that are of the following type
template<class T>
struct node
{
//some functions
private:
T m_data_1;
T m_data_2;
T m_data_3;
node* m_parent_1;
node* m_parent_2;
node* m_child;
}
The purpose of the template is to enable the user to choose float or double precision, with the idea being that by node<float> will occupy less memory (RAM).
However, when I switch from double to float the memory footprint of my program does not decrease as I expect it to. I have two questions,
Is it possible that the compiler/operating system is reserving more space than required for my floats (or even storing them as a double). If so, how do I stop this happening - I'm using linux on 64 bit machine with g++.
Is there a tool that lets me determine the amount of memory used by all the different classes? (i.e. some sort of memory profiling) - to make sure that the memory isn't being goobled up somewhere else that I haven't thought of.
If you are compiling for 64-bit, then each pointer will be 64-bits in size. This also means that they may need to be aligned to 64-bits. So if you store 3 floats, it may have to insert 4 bytes of padding. So instead of saving 12 bytes, you only save 8. The padding will still be there whether the pointers are at the beginning of the struct or the end. This is necessary in order to put consecutive structs in arrays to continue to maintain alignment.
Also, your structure is primarily composed of 3 pointers. The 8 bytes you save take you from a 48-byte object to a 40 byte object. That's not exactly a massive decrease. Again, if you're compiling for 64-bit.
If you're compiling for 32-bit, then you're saving 12 bytes from a 36-byte structure, which is better percentage-wise. Potentially more if doubles have to be aligned to 8 bytes.
The other answers are correct about the source of the discrepancy. However, pointers (and other types) on x86/x86-64 are not required to be aligned. It is just that performance is better when they are, which is why GCC keeps them aligned by default.
But GCC provides a "packed" attribute to let you exert control over this:
#include <iostream>
template<class T>
struct node
{
private:
T m_data_1;
T m_data_2;
T m_data_3;
node* m_parent_1;
node* m_parent_2;
node* m_child;
} ;
template<class T>
struct node2
{
private:
T m_data_1;
T m_data_2;
T m_data_3;
node2* m_parent_1;
node2* m_parent_2;
node2* m_child;
} __attribute__((packed));
int
main(int argc, char *argv[])
{
std::cout << "sizeof(node<double>) == " << sizeof(node<double>) << std::endl;
std::cout << "sizeof(node<float>) == " << sizeof(node<float>) << std::endl;
std::cout << "sizeof(node2<float>) == " << sizeof(node2<float>) << std::endl;
return 0;
}
On my system (x86-64, g++ 4.5.2), this program outputs:
sizeof(node<double>) == 48
sizeof(node<float>) == 40
sizeof(node2<float>) == 36
Of course, the "attribute" mechanism and the "packed" attribute itself are GCC-specific.
In addtion to the valid points that Nicol makes:
When you call new/malloc, it doesn't necessarily correspond 1 to 1 with a call the the OS to allocate memory. This is because in order to reduce the number of expensive syste, calls, the heap manager may allocate more than is requested, and then "suballocate" chunks of that when you call new/malloc. Also, memory can only be allocated 4kb at a time (typically - this is the minimum page size). Essentially, there may be chunks of memory allocated that are not currently actively used, in order to speed up future allocations.
To answer your questions directly:
1) Yes, the runtime will very likely allocate more memory then you asked for - but this memory is not wasted, it will be used for future news/mallocs, but will still show up in "task manager" or whatever tool you use. No, it will not promote floats to doubles. The more allocations you make, the less likely this edge condition will be the cause of the size difference, and the items in Nicol's will dominate. For a smaller number of allocations, this item is likely to dominate (where "large" and "small" depends entirely on your OS and Kernel).
2) The windows task manager will give you the total memory allocated. Something like WinDbg will actually give you the virtual memory range chunks (usually allocated in a tree) that were allocated by the run-time. For Linux, I expect this data will be available in one of the files in the /proc directory associated with your process.

What exactly do pointers store? (C++)

I know that pointers store the address of the value that they point to, but if you display the value of a pointer directly to the screen, you get a hexadecimal number. If the number is exactly what the pointer stores, then when saying
pA = pB; //both are pointers
you're copying the address. Then wouldn't there be a bigger overhead to using pointers when working with very small items like ints and bools?
A pointer is essentially just a number. It stores the address in RAM where the data is. The pointer itself is pretty small (probably the same size as an int on 32 bit architectures, long on 64 bit).
You are correct though that an int * would not save any space when working with ints. But that is not the point (no pun intended). Pointers are there so you can have references to things, not just use the things themselves.
Memory addresses.
That is the locations in memory where other stuff is.
Pointers are generally the word size of the processor, so they can generally be moved around in a single instruction cycle. In short, they are fast.
As others have said, a pointer stores a memory address which is "just a number' but that is an abstraction. Depending on processor architecture it may be more than one number, for instance a base and offset that must be added to dereference the pointer. In this case the overhead is slightly higher than if the address is a single number.
Yes, there is overhead in accessing an int or a bool via a pointer vs. directly, where the processor can put the variable in a register. Pointers are usually used where the value of the indirection outweighs any overhead, i.e. traversing an array.
I've been referring to time overhead. Not sure if OP was more concerned space or time overhead.
The number refers to its address in memory. The size of a pointer is typically the native size of the computer's architecture so there is no additional overhead compared to any other primitive type.
On some architectures there is an additional overhead of pointers to characters because the architecture only supports addressing words (32- or 64-bit values). A pointer to a character is therefore stored as a word address and an offset of the character within that word. De-referencing the pointer involves fetching the word and then shifting and masking it's value to extract the character.
Let me start from the basics. First of all, you will have to know what variable are and how they are used.
Variables are basically memory locations(usually containing some values) and we use some identifier(i.e., variable names) to refer to that memory location and use the value present at that location.
For understanding it better, suppose we want the information from memory cells present at some location relative to the current variable. Can we use the identifier to extract information from nearby cells?
No. Because the identifier(variable name) will only give the value contained in that particular cell.
But, If somehow we can get the memory address at which this variable is present then we can easily move to nearby locations and use their information as well(at runtime).
This is where pointers come into play. They are used to store the location of that variable so that we can use the additional address information whenever required.
Syntax: To store the address of a variable we can simply use & (address-of) operator.
foo = &bar
Here foo stores the address of variable bar.
Now, what if we want to know the value present at that address?
For that, we can simply use the * (dereference) operator.
value = *foo
Now that we have to store the address of a variable, we'll be needing the memory the same way as we need in case of a variable. This means pointers are also stored in the memory the same way as other variables, so just like in case of variables, we can also store the address of a pointer into yet another pointer.
An address in memory. Points to somewhere! :-)
Yes, you're right, both in terms of speed and memory.
Pointers almost always take up more bytes than your standard int and, especially, bool and char data types. On modern machines pointers typically are 8 bytes while char is almost always just 1 byte.
In this example, accessing the the char and bool from Foo requires more machine instructions than accessing from Bar:
struct Foo
{
char * c; // single character
bool * b; // single bool
};
struct Bar
{
char c;
bool b;
};
... And if we decide to make some arrays, then the size of the arrays of Foo would be 8 times larger - and the code is more spread-apart so this means you'll end up having a lot more cache misses.
#include <vector>
int main()
{
int size = 1000000;
std::vector<Foo> foo(size);
std::vector<Bar> bar(size);
return 0;
}
As dmckee pointed out, a single copy of a one-byte bool and a single copy of a pointer are just as fast:
bool num1, num2,* p1, * p2;
num1 = num2; // this takes one clock cycle
p1 = p2; // this takes another
As dmckee said, this is true when you're using a 64-bit architecture.
However, copying of arrays of ints, bools and chars can be much faster, because we can squeeze multiples of them onto each register:
#include <iostream>
int main ()
{
const int n_elements = 100000 * sizeof(int64_t);
bool A[n_elements];
bool B[n_elements];
int64_t * A_fast = (int64_t *) A;
int64_t * B_fast = (int64_t *) B;
const int n_quick_elements = n_elements / sizeof(int64_t);
for (int i = 0; i < 10000; ++i)
for (int j = 0; j < n_quick_elements; ++j)
A_fast[j] = B_fast[j];
return 0;
}
The STL containers and other good libraries do this sort of thing for us, using type_traits (is_trivially_copyable) and std::memcopy. Using pointers under the false guise that they're always just as fast can prevent those libraries from optimising.
Conclusion: It may seem obvious with these examples, but only use pointers/references on basic data types when you need to take/give access to the original object.