C++ alignment of class - member call on misaligned address - c++

I'm using UBSAN and am getting the following error. Note that I'm compiling with clang 6.0.1 with -fsanitize=undefined. I've read a number of background questions on SO and still can't solve my particular issue. Here are the background questions for reference:
What is the recommended way to align memory in C++11
Misaligned address using virtual inheritance
runtime error: member call on misaligned address 0x000001f67230 for type 'const A *', which requires 64 byte alignment
0x000001f67230: note: pointer points here
00 00 00 00 c0 72 f6 01 00 00 00 00 08 00 00 00 00 02 00 00 40 02 00 00 00 00 00 00 00 00 00 00
Here are some things to note about class C:
the object of type C is created using new (C* o = new C();)
type C has a member of type A that has 64 byte alignment. I verified this using alignof.
C is declared using class alignas(64) C -- but that doesn't solve my problem
My current hypothesis is that I need to use the C++11 equivalent of the C++17 std::aligned_alloc to create the object using aligned storage. But, I'm not sure how to best do this or if it will actually solve my problem. I would prefer to solve the problem once in the definition of class C as opposed to every time I create a C, if possible. What is the recommended approach to solve this issue to remove the UBSAN error?

If your class already has a member that requires 64 Byte alignment, then the class will already also have 64 Byte alignment out of necessity. So adding an explicit alignas(64) is not really gonna change anything.
The basic problem here is that allocation functions (in C++11) are only required to return memory aligned to fundamental alignment. C++11 left it implementation-defined whether over-aligned types are supported by new or not [expr.new]/1. C++17 introduced new-extended alignment and additional allocation functions to deal with that (if and which new-extended alignments are supported, however, is still implementation-defined).
If you can switch to a compiler that supports C++17, chances are that your code will just work. Otherwise you will probably have to either use some implementation-specific function to allocate aligned memory or just roll your own solution, e.g., based on std::align and placement new (which would work in C++11 too)…

Related

_CrtMemDumpAllObjectsSince not returning expected results

I'm using _CrtMemCheckpoint and _CrtMemDumpAllObjectsSince to track possible memory leaks in my dll.
In DllMain when DLL_PROCESS_ATTACH is detected an init function is called which calls _CrtMemCheckpoint(&startState) on the global _CrtMemState variable startState. When DLL_PROCESS_DETACH is detected an exit function is called that calls _CrtMemDumpAllObjectsSince(&startState). This returns
ExitInstance()Dumping objects ->
{8706} normal block at 0x07088200, 8 bytes long.
Data: <p v > 70 FF 76 07 01 01 CD CD
{8705} normal block at 0x07084D28, 40 bytes long.
Data: < > 00 00 00 10 FF FF FF FF FF FF FF FF 00 00 00 00
{4577} normal block at 0x070845F0, 40 bytes long.
Data: <dbV > 64 62 56 0F 01 00 00 00 FF FF FF FF FF FF FF FF
{166} normal block at 0x028DD4B8, 40 bytes long.
Data: <dbV > 64 62 56 0F 01 00 00 00 FF FF FF FF FF FF FF FF
{87} normal block at 0x02889BA8, 12 bytes long.
Data: < P > DC 50 90 02 00 00 00 00 01 00 00 00
So far so good, except the last three entries (4577, 166 and 87) are also in startState. I.E. If I run _CrtDumpMemoryLeaks() in my Init function and in my Exit function those entries are in both lists.
The documentation says this:
_CrtMemDumpAllObjectsSince uses the value of the state parameter to determine where to initiate the dump operation. To begin dumping from
a specified heap state, the state parameter must be a pointer to a
_CrtMemState structure that has been filled in by _CrtMemCheckpoint before _CrtMemDumpAllObjectsSince was called.
Which makes me believe that items tracked in startState would be excluded from the output. At the end of the Init function where _CrtMemCheckpoint is called there have been about 4700 allocation calls. Shouldn't _CrtMemDumpAllObjectsSince only dump objects allocated after that checkpoint call?
What have I missed?
Short
This is apparently strange only, but it does the job (in part), however in an overzealous mode.
These functions are decades old, so not buggy but not completely well designed.
Truth is there is something in the "old" state that change after your "since" state.
So the question is "yes it does reflect a change since, but is it a lethal leak?"
This frequent and amplified with delayed init for DLL.
Also by a lot of complex objects like map/string/array/list which does delay allocation of internal buffer.
Bad news being that nearly all complex object declared as "static" are in fact inited on first use.
So theses change ought to be shown in _CrtMemDumpAllObjectsSince because they changed their memory alloc.
Unfortunately the display is so crude and unfiltered that it also show too many irrelevant blocks (not modified).
Typical biggest culprit is use of "realloc" that change state of old alloc
This may even look stranger as they may disappear,
For example when a genuine malloc is made after stat snapshoot, because this one will do a kind of 'reset' of the low water marker used for dump, setting it to a higher level. And this magically make a bunch of your "extra display" to disappear.
Behavior is even more erratic if you are doing Multithreading, as it becomes easily non repetitive.
Note:
The fact that it doesn't show a file name and line number is a sign that it's dealing here with a preinit.
So culprits are most likely static complex objetc inited BEFORE main() (or InitInstance();)
Long:
_CrtMemDumpAllObjectsSince is painful!
And in fact, can be so clutter of non-useful info that it defies the purpose for a day/day simple use of _CrtMemDumpAllObjectsSince.
(In the good essence it inspires)
Workaround
None simple!
You may try to do a malloc then free AFTER you do your "since" state snapshoot, in order to tease this marker.
But to be safer and more in control, unfortunetaly I did saw way around writting my own _MyCrtMemDumpAllObjectsSince that dump from the original MS structure.
This was inspired by static void __cdecl dump_all_object_since_nolock(_CrtMemState const* const state) throw()
( see "debug_heap.cpp") Copyright Microsoft!
Code available "as is" more for inspiration.
But before some explanation on the way _CrtMemState works:
A _CrtMemState state does have a pointer 'pBlockHeader' which is usually a link of a double linked list of '_CrtMemBlockHeader*'
This list is in fact more than a snapshoot at a time it is build, but a selection (unclear how) of all the memory blocks in use, arranged in such a way that the "current state" is directly pointed to by 'pBlockHeader'
So that going
-> _block_header_prev allows the exploration of older blocks
-> _block_header_next allows the exploration of newer blocks (the Juice you look for in the concept "Since" but very dangerous as there is no end marker)
The TRICKY part:
MS maintain a vital internal static _CrtMemBlockHeader* called __acrt_first_block
This __acrt_first_block continuously changes during alloc and realloc
However, _MyCrtMemDumpAllObjectsSince dump does start from this __acrt_first_block and go forward (use _block_header_next) until finding a NULL ptr
The first block handled is decided by this __acrt_first_block, and the 'state' you sent is not more than a STOP of the dump.
Otherwise said _CrtMemDumpAllObjectsSince doesn't really dump the "since" state
But dump from __acrt_first_block as it is to your "since" state.
The 'for' loop is overkilling by showing blocks from a 'start'(since) to an 'end'(oldest modified since).
This makes sense but this does also encompass dumping blocks that have NOT been modified. Showing things we don't care about.
MS structure is clever and can be used directly, while it is not guaranteed that Microsoft will maintain the same vital structure _CrtMemBlockHeader in future
But over the last 15 years I haven't seen a bit of change in it (nor do I foresee any reason they would change a strategical and critical structure.)
I dislike copy/paste of MS code and resolve linker with my piggyback code
So the workaround I used is based on the capability to intercept text message sent to "Output" windows, decoding and storing ALL information in my own bank
Structurally below gives an idea of the intercepts using a static struct under lock to store all infos
_CrtSetReportHook2(_CRT_RPTHOOK_INSTALL,MyReportHookDumpFilter);
_CrtMemDumpAllObjectsSince(state); // CONTRARY to what it says, this thing seems to dump everything until old_state
_CrtSetReportHook2(_CRT_RPTHOOK_REMOVE,MyReportHookDumpFilter);
_MyReportHookDumpFilterCommand(_CUST_SORT,NULL);
_MyReportHookDumpFilterCommand(_CUST_DUMP,NULL);
The `_MyReportHookDumpFilterCommand` does check the preexistence of blocks that are NOT modified at all and avoids displaying those during it's Dump phase
Take it as inspiration of code to ease display.
If anybody have simpler way to use it, please share!

C++ struct member alignment and packing requirements on ARM

I'm wanting to make a struct form/layout more "defined/fixed" and less "up to the compiler's discretion". The struct layout will be shared when communicated between x86_64 and ARMv7-A architectures. Yes, it's not portable in general, but for this more restricted case, the endianness is the same (and could be converted if decided to be used on a different platform).
Are there alignment requirements for different data types/sizes on ARMv7-A? (i.e. misusing them is undefined behaviour)
Or can it pack them to any alignment? (i.e. it is all defined behaviour)
Do some alignments give better performance than others?
I had been reading on packing/alignment requirements for ARM, but unfortunately I've noticed it's a bit dated relative to my architecture.
http://www.aleph1.co.uk/chapter-10-arm-structured-alignment-faq
I have been using headers like this, on both architectures:
#pragma pack(4)
struct foo
{
uint8_t bar1; // 1 byte, the 3 padding bytes
std::array<double,1> bar2; // 8 bytes
};
#pragma pack()
I am using GCC cross compiler for ARM: gcc -Wall -Wextra -Wcast-align -march=armv7-a -mfloat-abi=softfp -mfpu=neon -mtune=cortex-a9
When I call foo abc; abc.bar2.data();, and compile with -fsanitize=undefined
-fsanitize=address produces a runtime error:
runtime error: member call on misaligned address 0xbeeb0c44 for type 'struct array', which requires 8 byte alignment
0xbeeb0c44: note: pointer points here
01 00 00 00 03 00 00 00 03 00 00 00 01 00 00 00 f4 0d eb be fc 0d eb be c0 a5 00 00 00 00 db 4b
^
/sysroot.../usr/include/c++/5.2.0/array:230:32: runtime error: reference binding to misaligned address 0xbeeb0c44 for type 'const double', which requires 8 byte alignment
0xbeeb0c44: note: pointer points here
01 00 00 00 03 00 00 00 03 00 00 00 01 00 00 00 f4 0d eb be fc 0d eb be c0 a5 00 00 00 00 db 4b
^
I like to trust the sanitiser, and it makes me think that's bad. However, if I turn off the sanitisers and crank up optimisation to -O3, it behaves okay. However I might just be (un)lucky, and this case of undefined behaviour just appears to work fine. I remember earlier I had the -Wcast-align warning triggered when I did pack(1) instead of pack(4), but I can't remember how I was accessing it to get that triggered. I assume that is also indicating that it was likely undefined behaviour. Is it true that the address sanitiser and -Wcast-align were indicating undefined behaviours for this architecture, even though it appeared to work?
Would it be recommended to increase to pack(8) to fix the undefined behaviour? It does unfortunately increase the memory usage.
Finally, is pragma pack(n) or __attribute__((packed)) for each struct entity the preferred way of doing this? (__attribute__((packed)) is a GCC extension, and unfortunately can't specify the pack size.)
There will always be a performance penalty for unaligned accesses because for a single access you may have to touch two cache lines.
I guess that the exact reaction to unaligned accesses (slow down or fault) is not defined by the architecture description but might be left to the implementation.
When you just follow the old habit of ordering the fields in a struct by descending size then all today's C(++) compilers will produce the same memory layout. I would suggest that route to save you from grief.

Visual studio 2010 - data segment and stack memory are same

I figured out that a constant literal get's placed in the data segment of the program (from SO) and is read-only and hence the line " s[0] = 'a' " would cause an error, which actually did happen when I uncommented that line and ran. However when I looked into the memory window in MS VS, the variables are all placed together in memory. I am curious as to how they(compiler) enforce read-only access to 's'?
#include <iostream>
int main(void)
{
char *s = "1023";
char s_arr[] = "4237";
char *d = "5067";
char s_arr_1[] = "9999";
char *e = "6789";
printf("%c\n", s[0]);
// s[0] = 'a'; This line would error out since s should point to data segment of the program
printf("%s\n", s);
system ("pause");
}
0x002E54F4 31 30 32 33 00 00 00 00 34 32 33 37 00 00 00 00 1023....4237....
0x002E5504 35 30 36 37 00 00 00 00 39 39 39 39 00 00 00 00 5067....9999....
0x002E5514 36 37 38 39 00 00 00 00 25 63 0a 00 25 73 0a 00 6789....%c..%s..
0x002E5524 70 61 75 73 65 00 00 00 00 00 00 00 43 00 3a 00 pause.......C.:.
Edit 1:
Updating the value stored in s_arr (which should be placed in stack space) to make it clear that it is placed adjacent to the string constants.
Edit 2: Since I am seeing answers regarding ro/rw access based on pages,
Here address .. 0x...4f4 is rw 0x...4fc is ro and again 0x...504 is rw. How do they achieve this granularity? Also since the each page could be a minimum of 4kb, one could argue that the 0x4fb could be the last address of the previous ro page. But I have now added a few more variables to show that they are all placed contiguously in memory and the granularity is per every 8 bytes.
You could say, Since pages are at 4k level as you mentioned,
I don't know what made you think that your example shows modifiable memory next to non-modifiable memory. What "granularity" are you talking about? You memory dump does not show anything like that.
The string "4237" that you see in your memory dump is not your s_arr. That "4237" that you see there is a read-only string literal that was used as an initializer for the s_arr. That initializer was copied to s_arr. Meanwhile, the actual s_arr resides somewhere else (in the stack) and is perfectly modifiable. It contains "4237" as well (as its initial value), but that's a completely different "4237", which you don't see in your memory dump. Ask your program to print the address of s_arr and you will see that its is nowhere near the memory range that you dumped.
Again, your claim about "0x...4f4 is rw 0x...4fc is ro and again 0x...504 is rw" is completely incorrect. All these addresses are read-only. None of them are read-write. There no "granularity" there whatsoever.
Remember that a declaration like this
char s_arr[] = "4237";
is really equivalent to
const char *unnamed = "4237";
char s_arr[5];
memcpy(s_arr, unnamed, 5);
In your memory dump, you are looking at that unnamed address from my example above. That memory region is read-only. Your s_arr resides in completely different memory region, which is read-write.
Since 32 bit platforms were introduced, everything is placed into the same segment (This is not exactly so, but it is easier to think that this is so. There are minor caveats that require several pages to explain and they apply to operating system design).
The 32-bit address space is split into several pages. Intell allows to assign RO bits with the page granularity. Debuggers display only the 32-bit (64 bit) address that technically is an offset in the segment. It is fine to call this offset simply address. There will be no mistake in this.
Nevertheless linkers call different memory areas as segments. These segments have nothing to do with Intel memory segments. Linker segments (code, data, stack, etc) are loaded into diffrenet pages. These pages get different attributes (RO/RW, execution permission, etc).
The block of memory you are showing is area where string constants are stored (as you can see all 4 values are directly there one next to another). This area is marked as read-only. On Windows each 4Kb block of memory (page) can have its own attributes (read/write/execute), so even 2 adjascent locations can have different access flags.
The area where variables are is in different location (stack in your case). You can see it by checking value of &s immediate window (or watch window).

Exception handler

There is this code:
char text[] = "zim";
int x = 777;
If I look on stack where x and text are placed there output is:
09 03 00 00 7a 69 6d 00
Where:
09 03 00 00 = 0x309 = 777 <- int x = 777
7a 69 6d 00 = char text[] = "zim" (ASCII code)
There is now code with try..catch:
char text[] = "zim";
try{
int x = 777;
}
catch(int){
}
Stack:
09 03 00 00 **97 85 04 08** 7a 69 6d 00
Now between text and x is placed new 4 byte value. If I add another catch, then there will be something like:
09 03 00 00 **97 85 04 08** **xx xx xx xx** 7a 69 6d 00
and so on. I think that this is some value connected with exception handling and it is used during stack unwinding to find appropriate catch when exception is thrown in try block. However question is, what is exactly this 4-byte value (maybe some address to excception handler structure or some id)?
I use g++ 4.6 on 32 bit Linux machine.
AFAICT, that's a pointer to an "unwind table". Per the the Itanium ABI implementation suggestions, the process "[uses] an unwind table, [to] find information on how to handle exceptions that occur at that PC, and in particular, get the address of the personality routine for that address range. "
The idea behind unwind tables is that the data needed for stack unwinding is rarely used. Therefore, it's more efficient to put a pointer on the stack, and store the reast of the data in another page. In the best cases, that page can remain on disk and doesn't even need to be loaded in RAM. In comparison, C style error handling often ends up in the L1 cache because it's all inline.
Needless to say all this is platform-dependent and etc.
This may be an address. It may point to either a code section (some handler address), or data section (pointer to a build-time-generated structure with frame info), or the stack of the same thread (pointer to a run-time-generated table of frame info).
Or it may also be a garbage, left due to an alignment requirement, which EH may demand.
For instance on Win32/x86 there's no such a gap. For every function that uses exception handling (has either try/catch or __try/__except/__finally or objects with d'tors) - the compiler generates an EXCEPTION_RECORD structure that is allocated on the stack (by the function prolog code). Then, whenever something changes within the function (object is created/destroyed, try/catch block entered/exited) - the compiler adds an instruction that modifies this structure (more correctly - modifies its extension). But nothing more is allocated on the stack.

How to ignore false positive memory leaks from _CrtDumpMemoryLeaks?

It seems whenever there are static objects, _CrtDumpMemoryLeaks returns a false positive claiming it is leaking memory. I know this is because they do not get destroyed until after the main() (or WinMain) function. But is there any way of avoiding this? I use VS2008.
I found that if you tell it to check memory automatically after the program terminates, it allows all the static objects to be accounted for. I was using log4cxx and boost which do a lot of allocations in static blocks, this fixed my "false positives"...
Add the following line, instead of invoking _CrtDumpMemoryLeaks, somewhere in the beginning of main():
_CrtSetDbgFlag ( _CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF );
For more details on usage and macros, refer to MSDN article:
http://msdn.microsoft.com/en-us/library/5at7yxcs(v=vs.71).aspx
Not a direct solution, but in general I've found it worthwhile to move as much allocation as possible out of static initialization time. It generally leads to headaches (initialization order, de-initialization order etc).
If that proves too difficult you can call _CrtMemCheckpoint (http://msdn.microsoft.com/en-us/library/h3z85t43%28VS.80%29.aspx) at the start of main(), and _CrtMemDumpAllObjectsSince
at the end.
1) You said:
It seems whenever there are static objects, _CrtDumpMemoryLeaks returns a false positive claiming it is leaking memory.
I don't think this is correct. EDIT: Static objects are not created on heap. END EDIT: _CrtDumpMemoryLeaks only covers crt heap memory. Therefore these objects are not supposed to return false positives.
However, it is another thing if static variables are objects which themselves hold some heap memory (if for example they dynamically create member objects with operator new()).
2) Consider using _CRTDBG_LEAK_CHECK_DF in order to activate memory leak check at the end of program execution (this is described here: http://msdn.microsoft.com/en-us/library/d41t22sb(VS.80).aspx). I suppose then memory leak check is done even after termination of static variables.
Old question, but I have an answer. I am able to split the report in false positives and real memory leaks. In my main function, I initialize the memory debugging and generate a real memory leak at the really beginning of my application (never delete pcDynamicHeapStart):
int main()
{
_CrtSetDbgFlag( _CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF );
char* pcDynamicHeapStart = new char[ 17u ];
strcpy_s( pcDynamicHeapStart, 17u, "DynamicHeapStart" );
...
After my application is finished, the report contains
Detected memory leaks!
Dumping objects ->
{15554} normal block at 0x00000000009CB7C0, 80 bytes long.
Data: < > DD DD DD DD DD DD DD DD DD DD DD DD DD DD DD DD
{14006} normal block at 0x00000000009CB360, 17 bytes long.
Data: <DynamicHeapStart> 44 79 6E 61 6D 69 63 48 65 61 70 53 74 61 72 74
{13998} normal block at 0x00000000009BF4B0, 32 bytes long.
Data: < ^ > E0 5E 9B 00 00 00 00 00 F0 7F 9C 00 00 00 00 00
{13997} normal block at 0x00000000009CA4B0, 8 bytes long.
Data: < > 14 00 00 00 00 00 00 00
{13982} normal block at 0x00000000009CB7C0, 16 bytes long.
Data: < # > D0 DD D6 40 01 00 00 00 90 08 9C 00 00 00 00 00
...
Object dump complete.
Now look at line "Data: <DynamicHeapStart> 44 79 6E 61 6D 69 63 48 65 61 70 53 74 61 72 74".
All reportet leaks below are false positives, all above are real leaks.
False positives don't mean there is no leak (it could be a static linked library which allocates heap at startup and never frees it), but you cannot eliminate the leak and that's no problem at all.
Since I invented this approach, I never had leaking applications any more.
I provide this here and hope this helps other developers to get stable applications.
Can you take a snapshot of the currently allocated objects every time you want a list? If so, you could remove the initially allocated objects from the list when you are looking for leaks that occur in operation. In the past, I have used this to find incremental leaks.
Another solution might be to sort the leaks and only consider duplicates for the same line of code. This should rule out static variable leaks.
Jacob
Ach. If you are sure that _CrtDumpMemoryLeaks() is lying, then you are probably correct. Most alleged memory leaks that I see are down to incorect calls to _CrtDumpMemoryLeaks(). I agree entirely with the following; _CrtDumpMemoryLeaks() dumps all open handles. But your program probably already has open handles, so be sure to call _CrtDumpMemoryLeaks() only when all handles have been released. See http://www.scottleckie.com/2010/08/_crtdumpmemoryleaks-and-related-fun/ for more info.
I can recommend Visual Leak Detector (it's free) rather than using the stuff built into VS. My problem was using _CrtDumpMemoryLeaks with an open source library that created 990 lines of output, all false positives so far as I can tell, as well as some things coming from boost. VLD ignored these and correctly reported some leaks I added for testing, including in a native DLL called from C#.