C++ `std::string`-like container with 4-byte aligned buffer - c++

I need a data structure in C++ that acts like a standard container of bytes but aligns the buffer at a multiple of four bytes. I'd like to re-use standard library abstractions as much as possible, rather than rolling my own abstraction.
Until now, I had been using std::string and std::vector<std::uint8_t> for this purpose. Unfortunately, I've gotten bug reports on the latest Mac OS, where apparently string::data() is no longer 4-byte aligned, but rather rather at an address congruent to 1 mod 4. As soon as I saw this, I realized of course nothing in the spec guarantees strings will be 4-byte aligned. I could switch over to vector<char>, but unfortunately now I'm not sure why this should be 4-byte aligned. Potentially even with a custom allocator the vector implementation could do something strange at the beginning of the buffer it allocates.
My question: What is a simple way of getting a dynamically-sized container of single-byte objects from the C++ standard library in which the first byte is at a 4-byte aligned address and individual bytes can be accessed through operator[]?
Note that this is not the same thing as asking how to ensure that the allocator used by the container returns 4-byte aligned memory. For example, std::string still allocates 4-byte aligned memory (probably 8, actually), it's just that on Mac OS string::data() does not point to the start of the allocated buffer. I don't see anything in the spec that would prevent a vector<char> from doing the same thing, even though for now that seems to work.

One of the solution is to use std::vector<uint32_t> internally, encapsulate that and convert data() to unsigned char * when you use it.

Related

Do ARM-CPUs need special pointer-decoration for unaligned accesses?

Do ARM-CPUs that support unaligned memory accesses need special pointer-decoration for unaligned accesses in C / C++ ? Or can I use every pointer for unaligned accesses ? Or is this compiler-dependent ?
In short, this is compiler-dependent since it is not covered by the C standard.
However, as noted in the comments some ARM instructions require an aligned pointer and any ARM compiler would need to implement some alignment strategy. Since ARM processors work much more efficiently with aligned access it is likely that the compiler will normally ensure that data is aligned.
It is also likely that the compiler provides ways of working with non-aligned data (of course, this would again be compiler-defined behavior implementing what would be undefined behavior in the C standard). Common examples are packed structures and casting of pointers.
Let us look at a few cases:
__packed struct
{
char a;
int i;
} s;
In this case, &s.i is likely to be an unaligned pointer which is fine because the compiler knows that and can generate code accordingly.
char buffer[80];
void decode(int *i)
{
int n = i[0];
...
}
In this case buffer may not be aligned (as an array of chars, there is no need), however, if the compiler normally aligns ints, then the compiler will assume that the pointer *i in decode() is aligned and may generate code based on that assumption.
In that case, calling decode((int *)buffer) could lead to a hard fault in the processor.
Hence, the longer answer is that (at least in the cases I know of) there is no visible "decoration" for aligned/unaligned pointers, but the compiler may make assumptions based on the type and origin of the pointer and thus have a kind of internal "decoration" of pointers. In that case it is important to avoid "cheating" the compiler into making a wrong assumption.
A standard conform C/C++ program can not have unaligned pointer access as you can not legally form such a pointer. The compiler guarantees that with appropriate padding in structures and malloc/new returning suitably aligned memory blocks.
But all compilers take this a bit looser and you can create an unaligned pointer by casting e.g. a char* to int* when the value is not aligned. This is implementation defined behavior so you are already on shaky ground.
Worse (for you) is that on ARM the CPU doesn't like unaligned access and has a flag that will make any attempt to access memory unaligned cause a CPU exception. Not every OS sets this flag to fault but you can't assume it is not set. The next OS update might set the flag. The compiler and you have to generate code that only uses aligned access.
Now sometimes you do have data that is not aligned and there are basically only 2 ways to access it safely:
char *buf = ....;
uint32_t t;
memcpy(t,&buf[123], sizeof(t));
or
struct DiskLayout [[gcc::packed]] { // replace with your compilers "packed" attribute
...
uint32_t magic;
...
};
struct DiskLayout disk;
read_from_disk(&disk);
uint32_t magic = disk.magic;
In the first case the memcpy() call will check the alignment at runtime and copy accordingly.
In the second case the packed attribute forces the compiler to not add any padding. It will also force the alignment of the structure to 1. So disk.magic will be 1 byte aligned and the compiler has to generate code accordingly. Which means it has to read 4 individual bytes and combined them back into a 32bit value. As you can imagine this is much much slower than a single 32bit read. Similar on a write the compiler has to split the value and write 4 individual bytes.
So the basic rule for unaligned access is: Always work on aligned copies of the data. Only ever read or write the value once.
If you want to use the packed attribute have a packed struct and a normal struct. Copy from the packed struct to the normal one, work on it and then copy it back. Don't work on the packed struct, the code will be much much slower.

Memory allocation for struct that has a STL class c++

I'm running tests to see how variables are getting placed on the memory and sizing them out when I use a struct.
Consider I have a struct that looks like below:
typedef struct _ttmp
{
WCHAR wcsTest1[13];
WCHAR wcsTest2[13];
wstring wstr;
}TTMP, *LPTTMP;
How big is the size of TTMP when STL classes like wstring are dynamically allocated?
Am I treating wstr as a 4-byte pointer?
I ran some tests to see the size of TTMP and got the size of the struct to be 88-bytes
and two of WCHAR arrays were 26-bytes which leaves the size of wstr to be 36-bytes, but that 36-bytes does not really make sense if I were to treat the wstring as a pointer. Seems like alignment padding does not apply here since I'm only using 32-bit variables.
Also, would it be bad a practice to use ZeroMemory api on structs with STL?
I've heard from someone that it is not safe to use the api, but the program ran fine when I test it
sizeof(WCHAR[13])=26, which is not an even multiple of 4 or 8, so alignment padding would account for a few bytes (unless you set the struct's alignment to 1-2 bytes via #pragma pack(1) or #pragma pack(2) or equivalent).
But, std::(w)string is much more than just a single pointer. It may have a few pointers in it, or at least a pointer and a couple of integers, which it uses to determine its c_str()/data(), size() and capacity() values. And it may even have an internal fixed buffer for use in Small String Optimization, which avoids dynamic memory allocation of small string values (this is likely true in your situation). So, at a minimum, a std::(w)string instance could be as few as, say, 8 bytes, or it could be as many as, say, 40 bytes, depending on its internal implementation.
See Exploring std::string for more details.
In your case, sizeof(std::wstring) is likely 32 (that is what gcc uses, for instance). So, 88-26-26-32=4, and those 4 bytes could easily be accounted for by alignment padding.
And yes, ZeroMemory() would be very bad (ie, undefined behavior) to use on a struct containing non-POD members.

Memory allocation of C++ vector<bool>

The vector<bool> class in the C++ STL is optimized for memory to allocate one bit per bool stored, rather than one byte. Every time I output sizeof(x) for vector<bool> x, the result is 40 bytes creating the vector structure. sizeof(x.at(0)) always returns 16 bytes, which must be the allocated memory for many bool values, not just the one at position zero. How many elements do the 16 bytes cover? 128 exactly? What if my vector has more or less elements?
I would like to measure the size of the vector and all of its contents. How would I do that accurately? Is there a C++ library available for viewing allocated memory per variable?
I don't think there's any standard way to do this. The only information a vector<bool> implementation gives you about how it works is the reference member type, but there's no reason to assume that this has any congruence with how the data are actually stored internally; it's just that you get a reference back when you dereference an iterator into the container.
So you've got the size of the container itself, and that's fine, but to get the amount of memory taken up by the data, you're going to have to inspect your implementation's standard library source code and derive a solution from that. Though, honestly, this seems like a strange thing to want in the first place.
Actually, using vector<bool> is kind of a strange thing to want in the first place. All of the above is essentially why its use is frowned upon nowadays: it's almost entirely incompatible with conventions set by other standard containers… or even those set by other vector specialisations.

std::string and data alignment

I'm planning to use std::string as a generic data buffer (instead of roll my own). I need to pack all kinds of POD into it, including user defined structs, is the memory buffer allocated by std::string properly aligned for such purpose ?
If it's unspecified in C++ standard, what's the situation in libstdc++ ?
The host CPU is x86_64.
First of all, std::string is probably not the best container to use if what you want to do is store arbitrary data. I'd suggest using std::vector instead.
Second, the alignment of all allocations made by the container is controlled by its allocator (the second template parameter, which defaults to std::allocator<T>). The default allocator will align allocations on the size of the largest standard type, which is often long long or long double, respectively 8 and 16 bytes on my machine, but the size of these types is not mandated by the standard.
If you want a specific alignment you should either check what your compiler aligns on, or ask for alignment explicitly, by supplying your own allocator or using std::aligned_storage.

Is std::make_unique<T[]> required to return aligned memory?

Is the memory owned by the unique pointer array_ptr:
auto array_ptr = std::make_unique<double[]>(size);
aligned to a sizeof(double) alignof(double) boundary (i.e. is it required by the std to be correctly aligned)?
Is the first element of the array the first element of a cache line?
Otherwise: what is the correct way of achieving this in C++14?
Motivation (update): I plan to use SIMD instructions on the array and since cache lines are the basic unit of memory on every single architecture that I know of I'd rather just allocate memory correctly such that the first element of the array is at the beginning of a cache line. Note that SIMD instructions work as long as the elements are correctly aligned (independently of the position of the elements between cache lines). However, I don't know if that has an influence at all but I can guess that yes, it does. Furthermore, I want to use these SIMD instructions on my raw memory inside a kernel. It is an optimization detail of a kernel so I don't want to allocate e.g. __int128 instead of int.
All objects that you obtain "normally" are suitably aligned, i.e. aligned at alignof(T) (which need not be the same as sizeof(T). That includes dynamic arrays. (Typically, the allocator ::operator new will just return a maximally aligned address so as not to have to worry about how the memory is used.)
There are no cache lines in C++. This is a platform specific issue that you need to deal with yourself (but alignas may help).
Try alignas plus a static check if it works (since support for over-aligned types is platform dependent), otherwise just add manual padding. You don't really care whether your data is at the beginning of a cache line, only that no two data elements are on the same cache line.
It is worth stressing that alignment isn't actually a concept you can check directly in C++, since pointers are not numbers. They are convertible to numbers, but the conversion is not generally meaningful other than being reversible. You need something like std::align to actually say "I have aligned memory", or just use alignas on your types directly.