Memory allocation for struct that has a STL class c++ - c++

I'm running tests to see how variables are getting placed on the memory and sizing them out when I use a struct.
Consider I have a struct that looks like below:
typedef struct _ttmp
{
WCHAR wcsTest1[13];
WCHAR wcsTest2[13];
wstring wstr;
}TTMP, *LPTTMP;
How big is the size of TTMP when STL classes like wstring are dynamically allocated?
Am I treating wstr as a 4-byte pointer?
I ran some tests to see the size of TTMP and got the size of the struct to be 88-bytes
and two of WCHAR arrays were 26-bytes which leaves the size of wstr to be 36-bytes, but that 36-bytes does not really make sense if I were to treat the wstring as a pointer. Seems like alignment padding does not apply here since I'm only using 32-bit variables.
Also, would it be bad a practice to use ZeroMemory api on structs with STL?
I've heard from someone that it is not safe to use the api, but the program ran fine when I test it

sizeof(WCHAR[13])=26, which is not an even multiple of 4 or 8, so alignment padding would account for a few bytes (unless you set the struct's alignment to 1-2 bytes via #pragma pack(1) or #pragma pack(2) or equivalent).
But, std::(w)string is much more than just a single pointer. It may have a few pointers in it, or at least a pointer and a couple of integers, which it uses to determine its c_str()/data(), size() and capacity() values. And it may even have an internal fixed buffer for use in Small String Optimization, which avoids dynamic memory allocation of small string values (this is likely true in your situation). So, at a minimum, a std::(w)string instance could be as few as, say, 8 bytes, or it could be as many as, say, 40 bytes, depending on its internal implementation.
See Exploring std::string for more details.
In your case, sizeof(std::wstring) is likely 32 (that is what gcc uses, for instance). So, 88-26-26-32=4, and those 4 bytes could easily be accounted for by alignment padding.
And yes, ZeroMemory() would be very bad (ie, undefined behavior) to use on a struct containing non-POD members.

Related

Do ARM-CPUs need special pointer-decoration for unaligned accesses?

Do ARM-CPUs that support unaligned memory accesses need special pointer-decoration for unaligned accesses in C / C++ ? Or can I use every pointer for unaligned accesses ? Or is this compiler-dependent ?
In short, this is compiler-dependent since it is not covered by the C standard.
However, as noted in the comments some ARM instructions require an aligned pointer and any ARM compiler would need to implement some alignment strategy. Since ARM processors work much more efficiently with aligned access it is likely that the compiler will normally ensure that data is aligned.
It is also likely that the compiler provides ways of working with non-aligned data (of course, this would again be compiler-defined behavior implementing what would be undefined behavior in the C standard). Common examples are packed structures and casting of pointers.
Let us look at a few cases:
__packed struct
{
char a;
int i;
} s;
In this case, &s.i is likely to be an unaligned pointer which is fine because the compiler knows that and can generate code accordingly.
char buffer[80];
void decode(int *i)
{
int n = i[0];
...
}
In this case buffer may not be aligned (as an array of chars, there is no need), however, if the compiler normally aligns ints, then the compiler will assume that the pointer *i in decode() is aligned and may generate code based on that assumption.
In that case, calling decode((int *)buffer) could lead to a hard fault in the processor.
Hence, the longer answer is that (at least in the cases I know of) there is no visible "decoration" for aligned/unaligned pointers, but the compiler may make assumptions based on the type and origin of the pointer and thus have a kind of internal "decoration" of pointers. In that case it is important to avoid "cheating" the compiler into making a wrong assumption.
A standard conform C/C++ program can not have unaligned pointer access as you can not legally form such a pointer. The compiler guarantees that with appropriate padding in structures and malloc/new returning suitably aligned memory blocks.
But all compilers take this a bit looser and you can create an unaligned pointer by casting e.g. a char* to int* when the value is not aligned. This is implementation defined behavior so you are already on shaky ground.
Worse (for you) is that on ARM the CPU doesn't like unaligned access and has a flag that will make any attempt to access memory unaligned cause a CPU exception. Not every OS sets this flag to fault but you can't assume it is not set. The next OS update might set the flag. The compiler and you have to generate code that only uses aligned access.
Now sometimes you do have data that is not aligned and there are basically only 2 ways to access it safely:
char *buf = ....;
uint32_t t;
memcpy(t,&buf[123], sizeof(t));
or
struct DiskLayout [[gcc::packed]] { // replace with your compilers "packed" attribute
...
uint32_t magic;
...
};
struct DiskLayout disk;
read_from_disk(&disk);
uint32_t magic = disk.magic;
In the first case the memcpy() call will check the alignment at runtime and copy accordingly.
In the second case the packed attribute forces the compiler to not add any padding. It will also force the alignment of the structure to 1. So disk.magic will be 1 byte aligned and the compiler has to generate code accordingly. Which means it has to read 4 individual bytes and combined them back into a 32bit value. As you can imagine this is much much slower than a single 32bit read. Similar on a write the compiler has to split the value and write 4 individual bytes.
So the basic rule for unaligned access is: Always work on aligned copies of the data. Only ever read or write the value once.
If you want to use the packed attribute have a packed struct and a normal struct. Copy from the packed struct to the normal one, work on it and then copy it back. Don't work on the packed struct, the code will be much much slower.

C++ `std::string`-like container with 4-byte aligned buffer

I need a data structure in C++ that acts like a standard container of bytes but aligns the buffer at a multiple of four bytes. I'd like to re-use standard library abstractions as much as possible, rather than rolling my own abstraction.
Until now, I had been using std::string and std::vector<std::uint8_t> for this purpose. Unfortunately, I've gotten bug reports on the latest Mac OS, where apparently string::data() is no longer 4-byte aligned, but rather rather at an address congruent to 1 mod 4. As soon as I saw this, I realized of course nothing in the spec guarantees strings will be 4-byte aligned. I could switch over to vector<char>, but unfortunately now I'm not sure why this should be 4-byte aligned. Potentially even with a custom allocator the vector implementation could do something strange at the beginning of the buffer it allocates.
My question: What is a simple way of getting a dynamically-sized container of single-byte objects from the C++ standard library in which the first byte is at a 4-byte aligned address and individual bytes can be accessed through operator[]?
Note that this is not the same thing as asking how to ensure that the allocator used by the container returns 4-byte aligned memory. For example, std::string still allocates 4-byte aligned memory (probably 8, actually), it's just that on Mac OS string::data() does not point to the start of the allocated buffer. I don't see anything in the spec that would prevent a vector<char> from doing the same thing, even though for now that seems to work.
One of the solution is to use std::vector<uint32_t> internally, encapsulate that and convert data() to unsigned char * when you use it.

What does reinterpret_cast do binary-wise?

I'm writing a logger in C++, and I've come to the part where I'd like to take a log record and write in to a file.
I have created a LogRecord struct, and would like to serialize it and write it to a file in binary mode.
I have read some posts about serialization in C++, and one of the answers included this following snippet:
reinterpret_cast<char*>(&logRec)
I've tried reading about reinterpret_cast and what it does, but I couldn't fully understand what's really happening in the background.
From what I understand, it takes a pointer to my struct, and turns it into a pointer to a char, so it thinks that the chunk of memory that holds my struct is actually a string, is that true? How can that work?
A memory address is just a memory address. Memory isn't inherently special - it's just a huge array of bytes, for all we care. What gives memory its meaning is what we do with it, and the lenses through which we view it.
A pointer to a struct is just an integer that specifies some offset into memory - surely you can treat one integer in any way you want, in your case, as a pointer to some arbitrary number of bytes (chars).
reinterpret_cast() doesn't do anything special except allow you to convert one view of a memory address into another view of a memory address. It's still up to you to treat that memory address correctly.
For instance, char* is the conventional way to refer to a string of characters in C++ - but the type char* literally means "a pointer to a single char". How does it come to mean a pointer to a null-terminated string of characters? By convention, that's how. We treat the type differently depending on the context, but it's up to us to make sure we do so correctly.
For instance, how do you know how many bytes to read through your char* pointer to your struct? The type itself gives you zero information - it's up to you to know that you've really got a byte-oriented pointer to a struct of fixed length.
Remember, under the hood, the machine has no types. A piece of paper doesn't care if you write an essay on each line, or if you scribble all over the thing. It's how we treat it - and how the tools we use (C++) treat it.
Binary-wise, it does nothing at all. This casting is a higher-level concept that has no bearing in any actual machine instructions.
At a low level, a pointer is just a numeric value that holds a memory address. There is nothing to be done in telling the compiler "although you thought the destination memory contained a struct, now please think that it contains a char". The actual address itself doesn't change in any way.
From what I understand, it takes a pointer to my struct, and turns it into a pointer to a char, so it thinks that the chunk of memory that holds my struct is actually a string, is that true?
Yes.
How can that work?
A string is just a sequence of bytes, and your object is just a sequence of bytes, so that's how it works.
But it won't if your object is logically more than just a sequence of bytes. Any indirection, and you're hosed. Furthermore, any implementation-defined padding or representation/endianness and your data is non-portable. This might be acceptable; it really depends on your requirements.
Casting a struct into an array of bytes (chars) is a classic low impact method of binary serialization. This is based on the assumption that the content of the struct exists contiguously in memory. The casting allows us write this data to a file or socket using the normal APIs.
This only works though if the data is contiguous. This is true for C style structs or PODs in C++ terminology. It will not work with complex C++ objects or any struct with pointers to storage outside the struct. For text data you will need to use fixed size character arrays.
struct {
int num;
char name[50];
};
will serialize correctly.
struct {
int num;
char* name;
};
will not serialize correctly since the data for the string is stored outside the struct;
If you are sending data across a nework you will also need to ensure that the struct is packed or at least of known alignment and that integers are converted to a consistent endianness (network byte order is normally big endian)

How to fill in struct paddings in C/C++?

Suppose I have the following struct:
typedef struct {
int mID;
struct in_addr mIP;
size_t dataSize;
// Another structure
fairness_structure str;
bool ack;
bool stability;
bool stop_message;
}HeaderType;
As you know, the size of a struct would vary due to its alignment. How to fill in the padding between fields with some data, say with zeros?
Just initialize the structure with memset, and the padding will be filled as well.
memset(&mystruct, 0, sizeof(HeaderType));
If you want to really only fill the pads, you can can cast the pointer to char* and do the arithmetics. But in this case you MUST know how the compiler padded the structure, or enforce it yourself with #pragma pack.
You can use offsetof() macro to get the offset of struct members.
char *off = (char *)&mystruct + offsetof(HeaderType, ack);
char *pad_start = off + sizeof(mystruct.ack);
char *pad_end = (char *)&mystruct + offsetof(HeaderType, stability);
Bedtime reading: The Lost Art of C Structure Packing
Controlling the contents of padding bits and bytes does not seem very useful. But if you write the contents of a structure to a file with a single write or fwrite call, You probably care about the padding and may want to make sure they have consistent values, preferably 0, at all times. Not that is matters when you read the contents back from the file, but in order for the file contents to be predictable and reproducible. Some development tools are known to produce unpredictable contents in object or executable files exactly for this reason, making it very difficult to rebuild from source and check signatures.
So if you really need this, you want a simple and portable method.
The bad news is the C Standard does not have a generic solution for this.
The only guaranty about the contents of padding bytes and bits the standard makes is for uninitialized structures of static storage. Padding is guarantied to be zero in this case (in a hosted environment). In practice, this is also true of initialized structures because it is simple enough for compiler writers to do so.
What about local structures with automatic storage? If they are not initialized, both fields and padding contents are indeterminate. If you just clear the bytes with a memset(&s, 0, sizeof(s)) the padding will be cleared and you can start modifying struct members... Bad news again: the C standard describes as Unspecified behaviour The value of padding bytes when storing values in structures or unions (6.2.6.1).
In other words, storing values in structure members can have side effects on the contents of padding bits and bytes. The compiler is allowed to generate code that does that and it may be more efficient to do so.
The method described by Marek beyond the simple memset is very cumbersome to use, especially if you have bitfields. In practice, clearing the structures before you initialize the fields manually seems the simplest way to achieve the purpose, and I have not seen a compilers that takes advantage of the Standard's leniency concerning the padding bytes. If you pass the structures by value, all bets are off as the compiler may generate code that does not copy the padding.
As a conclusion: if you use local structures, clear them with memset before use and do not pass them by value. There is no guaranty padding will keep a 0 value, but that's the best you can do.

C++ Memory alignment in custom stack allocator

Usually data is aligned at power of two addresses depending on its size.
How should I align a struct or class with size of 20 bytes or another non-power-of-two size?
I'm creating a custom stack allocator so I guess that the compiler wont align data for me since I'm working with a continuous block of memory.
Some more context:
I have an Allocator class that uses malloc() to allocate a large amount of data.
Then I use void* allocate(U32 size_of_object) method to return the pointer that where I can store whether objects I need to store.
This way all objects are stored in the same region of memory and it will hopefully fit in the cache reducing cache misses.
C++11 has the alignof operator specifically for this purpose. Don't use any of the tricks mentioned in other posts, as they all have edge cases or may fail for certain compiler optimisations. The alignof operator is implemented by the compiler and knows the exact alignment being used.
See this description of c++11's new alignof operator
Although the compiler (or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned. http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
This says that the compiler takes care of it for you, 99.9% of the time. As for how to force an object to align a specific way, that is compiler specific, and only works in certain circumstances.
MSVC: http://msdn.microsoft.com/en-us/library/83ythb65.aspx
__declspec(align(20))
struct S{ int a, b, c, d; };
//must be less than or equal to 20 bytes
GCC: http://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/Type-Attributes.html
struct S{ int a, b, c, d; }
__attribute__ ((aligned (20)));
I don't know of a cross-platform way (including macros!) to do this, but there's probably neat macro somewhere.
Unless you want to access memory directly, or squeeze maximum data in a block of memory you don't worry about alignment -- the compiler takes case of that for you.
Due to the way processor data buses work, what you want to avoid is 'mis-aligned' access. Usually you can read a 32 bit value in a single access from addresses which are multiples of four; if you try to read it from an address that's not such a multiple, the CPU may have to grab it in two or more pieces. So if you're really worrying about things at this level of detail, what you need to be concerned about is not so much the overall struct, as the pieces within it. You'll find that compilers will frequently pad out structures with dummy bytes to ensure aligned access, unless you specifically force them not to with a pragma.
Since you've now added that you actually want to write your own allocator, the answer is straight-forward: Simply ensure that your allocator returns a pointer whose value is a multiple of the requested size. The object's size itself will already come suitably adjusted (via internal padding) so that all member objects themselves are properly aligned, so if you request sizeof(T) bytes, all your allocator needs to do is to return a pointer whose value is divisible by sizeof(T).
If your object does indeed have size 20 (as reported by sizeof), then you have nothing further to worry about. (On a 64-bit platform, the object would probably be padded to 24 bytes.)
Update: In fact, as I only now came to realize, strictly speaking you only need to ensure that the pointer is aligned, recursively, for the largest member of your type. That may be more efficient, but aligning to the size of the entire type is definitely not getting it wrong.
How should I align a struct or class with size of 20 bytes or another non-power-of-two size?
Alignment is CPU-specific, so there is no answer to this question without, at least, knowing the target CPU.
Generally speaking, alignment isn't something that you have to worry about; your compiler will have the rules implemented for you. It does come up once in a while, like when writing an allocator. The classic solution is discussed in The C Programming Language (K&R): use the worst possible alignment. malloc does this, although it's phrased as, "the pointer returned if the allocation succeeds shall be suitably aligned so that it may be assigned to a pointer to any type of object."
The way to do that is to use a union (the elements of a union are all allocated at the union's base address, and the union must therefore be aligned in such a way that each element could exist at that address; i.e., the union's alignment will be the same as the alignment of the element with the strictest rules):
typedef Align long;
union header {
// the inner struct has the important bookeeping info
struct {
unsigned size;
header* next;
} s;
// the align member only exists to make sure header_t's are always allocated
// using the alignment of a long, which is probably the worst alignment
// for the target architecture ("worst" == "strictest," something that meets
// the worst alignment will also meet all better alignment requirements)
Align align;
};
Memory is allocated by creating an array (using somthing like sbrk()) of headers large enough to satisfy the request, plus one additional header element that actually contains the bookkeeping information. If the array is called arry, the bookkeeping information is at arry[0], while the pointer returned points at arry[1] (the next element is meant for walking the free list).
This works, but can lead to wasted space ("In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary"). I'm aware of a better approach that tries to get a type-specific alignment instead of "the alignment that will work for anything."
Compilers also often have compiler-specific commands. They aren't standard, and they require that you know the correct alignment requirements for the types in question. I would avoid them.