Unaligned load/store in gcc vector extension

Unaligned load/store in gcc vector extension - c++

I need to access unaligned values using GCC vector extension
The program below crashes - in both clang and gcc
typedef int __attribute__((vector_size(16))) int4;
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
int main()
{
int v[64] __attribute__((aligned(16))) = {};
int4p ptr = reinterpret_cast<int4p>(&v[7]);
int4 val = *ptr;
}
However if I change
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
to
typedef int __attribute__((vector_size(16),aligned(4))) int4u;
typedef int4u *int4up;
The generated assembly code is correct (using unaligned load) - in both clang and gcc.
What is wrong with single definition or what do I miss? Can it be the same bug in both clang and gcc?
Note: it happens in both clang and gcc

TL;DR
You've altered the alignment of the pointer type itself, not the pointee type. This has nothing to do with the vector_size attribute and everything to do with the aligned attribute. It's also not a bug, and it's implemented correctly in both GCC and Clang.
Long Story
From the GCC documentation, § 6.33.1 Common Type Attributes (emphasis added):
aligned (alignment)
This attribute specifies a minimum alignment (in bytes) for variables of the specified type. [...]
The type in question is the type being declared, not the type pointed to by the type being declared. Therefore,
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
declares a new type T that points to objects of type *T, where:
*T is a 16-byte vector with default alignment for its size (16 bytes)
T is a pointer type, and the variables of this type may be exceptionally stored aligned to as low as 4-byte boundaries (even though what they point to is a type *T that is far more aligned).
Meanwhile, § 6.49 Using Vector Instructions through Built-in Functions says (emphasis added):
On some targets, the instruction set contains SIMD vector instructions which operate on multiple values contained in one large register at the same time. For example, on the x86 the MMX, 3DNow! and SSE extensions can be used this way.
The first step in using these extensions is to provide the necessary data types. This should be done using an appropriate typedef:
typedef int v4si __attribute__ ((vector_size (16)));
The int type specifies the base type, while the attribute specifies the vector size for the variable, measured in bytes. For example, the declaration above causes the compiler to set the mode for the v4si type to be 16 bytes wide and divided into int sized units. For a 32-bit int this means a vector of 4 units of 4 bytes, and the corresponding mode of foo is V4SI.
The vector_size attribute is only applicable to integral and float scalars, although arrays, pointers, and function return values are allowed in conjunction with this construct. Only sizes that are a power of two are currently allowed.
Demo
#include <stdio.h>
typedef int __attribute__((aligned(128))) * batcrazyptr;
struct batcrazystruct{
batcrazyptr ptr;
};
int main()
{
printf("Ptr: %zu\n", sizeof(batcrazyptr));
printf("Struct: %zu\n", sizeof(batcrazystruct));
}
Output:
Ptr: 8
Struct: 128
Which is consistent with batcrazyptr ptr itself having its alignment requirement changed, not its pointee, and in agreement with the documentation.
Solution
I'm afraid you'll be forced to use a chain of typedef's, as you have done with int4u. It would be unreasonable to have a separate attribute to specify the alignment of each pointer level in a typedef.

Related

If `atomic<T>` is lock free and has the same size as `T`, will the memory layout be the same?

This question here indicates that std::atomic<T> is generally supposed to have the same size as T, and indeed that seems to be the case for gcc, clang, and msvc on x86, x64, and ARM.
In an implementation where std::atomic<T> is always lock free for some type T, is it's memory layout guaranteed to be the same as the memory layout of T? Are there any additional special requirements imposed by std::atomic, such as alignment?

Upon reviewing [atomics.types.generic], which the answer you linked quotes in part, the only remark regarding alignment is the note which you saw before:
Note: The representation of an atomic specialization need not have the same size as its corresponding argument type. Specializations should have the same size whenever possible, as this reduces the effort required to port existing code
In a newer version:
The representation of an atomic specialization
need not have the same size and alignment requirement as
its corresponding argument type.
Moreover, at least one architecture, IA64, gives a requirement for atomic behavior of instructions such as cmpxchg.acq, which indicates that it's likely that a compiler targeting IA64 may need to align atomic types differently than non-atomic types, even in the absence of a lock.
Furthermore, the use of a compiler feature such as packed structs will cause alignment to differ between atomic and non-atomic variants. Consider the following example:
#include <atomic>
#include <iostream>
struct __attribute__ ((packed)) atom{
char a;
std::atomic_long b;
};
struct __attribute__ ((packed)) nonatom{
char a;
long b;
};
atom atom1;
nonatom nonatom1;
int disp_aligns(int num) {
std::cout<< alignof(atom1.b) << std::endl;
std::cout<< alignof(nonatom1.b) << std::endl;
}
On at least one configuration, the alignment of atom1.b will be on an 8-byte boundary, while the alignment of nonatom1.b will be on a 1-byte boundary. However, this is under the supposition that we requested that the structs be packed; it's not clear whether you are interested in this case.

From the standard:
The representation of an atomic specialization need not have the same size and alignment requirement as its corresponding argument type.
So the answer, at least for now, is no, it is not guaranteed to be the same size, nor have same alignment. But it might have, unless it doesn't and then it won't.

Why this code produces invalid alignment with MSVC?

I have tested this code on ideone.com and it outputs 16 as it should. However when I try it in Visual Studio 2013 it shows 8. Is it a bug or lack of C++11 support from the compiler?
#include <iostream>
#include <type_traits>
using namespace std;
using float_pack = aligned_storage<4 * sizeof(float), 16>::type;
int main() {
cout << alignment_of<float_pack>::value << endl;
return 0;
}
I have used alignment_of because MSVC doesn't support alignof.
Edit: I see that I can't get 16 alignment with aligned_storage. But why this snippet is ok?
#include <iostream>
#include <type_traits>
#include <xmmintrin.h>
using namespace std;
__declspec(align(16)) struct float_pack {
float x[4];
};
int main()
{
cout << alignment_of<float_pack>::value << endl;
}
Output is 16. Does that mean that compiler can provide larger alignment when using extensions? Why I can't achieve the same result with aligned_storage? Only because MSVC doesn't provide that with aligned_storage?

It looks like std::max_align_t is 8, see it live:
std::cout << alignment_of<std::max_align_t>::value << '\n';
In the draft C++ standard section 3.11 Alignment it says:
A fundamental alignment is represented by an alignment less than or equal to the greatest alignment sup- ported by the implementation in all contexts, which is equal to alignof(std::max_align_t) (18.2).[...]
Which says that that is the max alignment the implementation supports, this seems to be backed up by this boost doc which says:
An extended alignment is represented by an alignment greater than alignof(std::max_align_t). It is implementation-defined whether any extended alignments are supported and the contexts in which they are supported. A type having an extended alignment requirement is an over-aligned type.
max_align_t is by the standard tied to the fundamental alignment which James as informed us is 8 bytes. Whereas an extension does not have to stick to this as long as it is documented which if we read the docs for __declspec align we see that it says:
Writing applications that use the latest processor instructions
introduces some new constraints and issues. In particular, many new
instructions require that data must be aligned to 16-byte boundaries.
Additionally, by aligning frequently used data to the cache line size
of a specific processor, you improve cache performance. For example,
if you define a structure whose size is less than 32 bytes, you may
want to align it to 32 bytes to ensure that objects of that structure
type are efficiently cached.
[...]
Without __declspec(align(#)), Visual C++ aligns data on natural
boundaries based on the size of the data, for example 4-byte integers
on 4-byte boundaries and 8-byte doubles on 8-byte boundaries. Data in
classes or structures is aligned within the class or structure at the
minimum of its natural alignment and the current packing setting (from #pragma pack or the /Zp compiler option).

std::aligned_storage defines a type of size Len, with the alignment requirement you provide. If you ask for an unsupported alignment, your program is ill-formed.
template <std::size_t Len, std::size_t Align
= default-alignment > struct aligned_storage;
Len shall not be zero. Align shall be equal to alignof(T) for some type T or to default-alignment.
The value of default-alignment shall be the most stringent alignment requirement for any C++ object type whose size is no greater than Len (3.9). The member typedef type shall be a POD type suitable for use as uninitialized storage for any object whose size is at most Len and whose alignment is a divisor of Align.
[ Note: A typical implementation would define aligned_storage as:
template <std::size_t Len, std::size_t Alignment>
struct aligned_storage {
typedef struct {
alignas(Alignment) unsigned char __data[Len];
} type;
};
—end note ]
And for alignas:
7.6.2 Alignment specifier [dcl.align]
1 An alignment-specifier may be applied to a variable or to a class data member, but it shall not be applied to a bit-field, a function parameter, the formal parameter of a catch clause (15.3), or a variable declared with the register storage class specifier. An alignment-specifier may also be applied to the declaration of a class or enumeration type. An alignment-specifier with an ellipsis is a pack expansion (14.5.3).
2 When the alignment-specifier is of the form alignas( assignment-expression ):
— the assignment-expression shall be an integral constant expression
— if the constant expression evaluates to a fundamental alignment, the alignment requirement of the
declared entity shall be the specified fundamental alignment
— if the constant expression evaluates to an extended alignment and the implementation supports that
alignment in the context of the declaration, the alignment of the declared entity shall be that alignment
— if the constant expression evaluates to an extended alignment and the implementation does not support
that alignment in the context of the declaration, the program is ill-formed
— if the constant expression evaluates to zero, the alignment specifier shall have no effect
— otherwise, the program is ill-formed.

C/C++ Pointer to a POD struct also points to the 1st struct member

Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.

For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]

From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)

The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.

For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.

Alignment of structure members for heap allocated variables

I have a structure where the members have certain alignment requirements while no such requirement exist for the structure itself.
I'm using gcc so using __attribute__((aligned(n))) will do the trick, unless (as far as I know) an instance of the struct is allocated on the heap.
How do I keep the alignment for heap allocated instances? posix_memalign(3) will align the structure itself, but not the structure members, so I can't see how to make it work with that function.
The source is here: https://github.com/colding/disruptorC/blob/master/src/disruptor.h#L92

No matter where a struct is—stack or heap—the layout of the struct must be the same. The compiler ensures that the sizeof() and the layout of elements within the struct match the alignment requirements (via padding). It also gives the struct itself a required alignment so that its members end up on the right boundary (this value is the largest alignment of any of its members).
So just use posix_memalign and you'll be fine:
MyStruct* ptr;
posix_memalign(&ptr, alignof(MyStruct), sizeof(MyStruct));
For example, if you have this definition:
struct MyStruct {
char c;
double d;
}
It's compiler-dependent, of course, but the most likely behavior is that the compiler lays out the following:
1-byte char
7 bytes of padding
8-byte double
and gives the whole thing an alignment of 8 bytes. Then, if the struct itself is aligned properly (on an 8-byte boundary), the double that's 8 bytes offset into it will also be properly aligned.
(alignof is different in different compilers/standards: __alignof__ in gcc, __alignof in MSVC, and alignof in C11/C++11.)

Is the size of a struct required to be an exact multiple of the alignment of that struct?

Once again, I'm questioning a longstanding belief.
Until today, I believed that the alignment of the following struct would normally be 4 and the size would normally be 5...
struct example
{
int m_Assume_32_Bits;
char m_Assume_8_Bit_Bytes;
};
Because of this assumption, I have data structure code that uses offsetof to determine the distance in bytes between two adjacent items in an array. Today, I spotted some old code that was using sizeof where it shouldn't, couldn't understand why I hadn't had bugs from it, coded up a unit test - and the test surprised me by passing.
A bit of investigation showed that the sizeof the type I used for the test (similar to the struct above) was an exact multiple of the alignment - ie 8 bytes. It had padding after the final member. Here is an example of why I never expected this...
struct example2
{
example m_Example;
char m_Why_Cant_This_Be_At_Offset_6_Bytes;
};
A bit of Googling showed examples that make it clear that this padding after the final member is allowed - for example http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding (the "or at the end of the structure" bit).
This is a bit embarrassing, as I recently posted this comment - Use of struct padding (my first comment to that answer).
What I can't seem to determine is whether this padding to an exact multiple of the alignment is guaranteed by the C++ standard, or whether it is just something that is permitted and that some (but maybe not all) compilers do.
So - is the size of a struct required to be an exact multiple of the alignment of that struct according to the C++ standard?
If the C standard makes different guarantees, I'm interested in that too, but the focus is on C++.

5.3.3/2
When applied to a class, the result [of sizeof] is the number of bytes in an object of that class, including any padding required for placing objects of that type in an array.
So yes, object size is a multiple of its alignment.

One definition of alignment size:
The alignment size of a struct is the offset from one element to the next element when you have an array of that struct.
By its nature, if you have an array of a struct with two elements, then both need to have aligned members, so that means that yes, the size has to be a multiple of the alignment. (I'm not sure if any standard explicitly enforce this, but because the size and alignment of a struct don't depend on whether the struct is alone or inside an array, the same rules apply to both, so it can't really be any other way.)

The standard says (section [dcl.array]:
An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
Therefore there is no padding between array elements.
Padding inside structures is not required by the standard, but the standard doesn't permit any other way of aligning array elements.

I am unsure if this is in the actual C/C++ standard, and I am inclined to say that it is up to the compiler (just to be on the safe side). However, I had a "fun" time figuring that out a few months ago, where I had to send dynamically generated C structs as byte arrays across a network as part of a protocol, to communicate with a chip. The alignment and size of all the structs had to be consistent with the structs in the code running on the chip, which was compiled with a variant of GCC for the MIPS architecture. I'll attempt to give the algorithm, and it should apply to all variants of gcc (and hopefully most other compilers).
All base types, like char, short and int align to their size, and they align to the next available position, regardless of the alignment of the parent. And to answer the original question, yes the total size is a multiple of the alignment.
// size 8
struct {
char A; //byte 0
char B; //byte 1
int C; //byte 4
};
Even though the alignment of the struct is 4 bytes, the chars are still packed as close as possible.
The alignment of a struct is equal to the largest alignment of its members.
Example:
//size 4, but alignment is 2!
struct foo {
char A; //byte 0
char B; //byte 1
short C; //byte 3
}
//size 6
struct bar {
char A; //byte 0
struct foo B; //byte 2
}
This also applies to unions, and in a curious way. The size of a union can be larger than any of the sizes of its members, simply due to alignment:
//size 3, alignment 1
struct foo {
char A; //byte 0
char B; //byte 1
char C; //byte 2
};
//size 2, alignment 2
struct bar {
short A; //byte 0
};
//size 4! alignment 2
union foobar {
struct foo A;
struct bar B;
}
Using these simple rules, you should be able to figure out the alignment/size of any horribly nested union/struct you come across. This is all from memory, so if I have missed a corner case that can't be decided from these rules please let me know!

C++ doesn't explicitly says so, but it is a consequence of two other requirements:
First, all objects must be well-aligned.
3.8/1 says
The lifetime of an object of type T begins when [...] storage with the proper alignment and size for type T is obtained
and 3.9/5:
Object types have *alignnment requirements (3.9.1, 3.9.2). The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.
So every object must be aligned according to its alignment requirements.
The other requirement is that objects in an array are allocated contigulously:
8.3.4/1:
An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
For the objects in an array to be contiguously allocated, there can be no padding between them. But for every object in the array to be properly aligned, each individual object must be padded so that the byte immediately after the end of the object is also well aligned. In other words, the size of the object must be a multiple of its alignment.

So to split your question up into two:
1. Is it legal?
[5.3.3.2] When applied to a class, the result [of the sizeof() operator] is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.
So, no, it's not.
2. Well, why isn't it?
Here, I cna only speculate.
2.1. Pointer arithmetics get weirder
If alignment would be "between array elements" but would not affect the size, zthigns would get needlessly complicated, e.g.
(char *)(X+1) != ((char *)X) + sizeof(X)
(I have a hunch that this is required implicitely by the standard even without above statement, but I can't put it to proof)
2.2 Simplicity
If alignment affects size, alignment and size can be decided by looking at a single type. Consider this:
struct A { int x; char y; }
struct B { A left, right; }
With the current standard, I just need to know sizeof(A) to determine size and layout of B.
With the alternate you suggest I need to know the internals of A. Similar to your example2: for a "better packing", sizeof(example) is not enough, you need to consider the internals of example.

It is possible to produce a C or C++ typedef whose alignment is not a multiple of its size. This came up recently in this bindgen bug. Here's a minimal example, which I'll call test.c below:
#include <stdio.h>
#include <stdalign.h>
__attribute__ ((aligned(4))) typedef struct {
char x[3];
} WeirdType;
int main() {
printf("sizeof(WeirdType) = %ld\n", sizeof(WeirdType));
printf("alignof(WeirdType) = %ld\n", alignof(WeirdType));
return 0;
}
On my Arch Linux x86_64 machine, gcc -dumpversion && gcc test.c && ./a.out prints:
9.3.0
sizeof(WeirdType) = 3
alignof(WeirdType) = 4
Similarly clang -dumpversion && clang test.c && ./a.out prints:
9.0.1
sizeof(WeirdType) = 3
alignof(WeirdType) = 4
Saving the file as test.cc and using g++/clang++ gives the same result. (Update from a couple years later: I get the same results from GCC 11.1.0 and Clang 13.0.0.)
Notably however, MSVC on Windows does not seem to reproduce any behavior like this.

The standard says very little about padding and alignment. Very little is guaranteed. About the only thing you can bet on is that the first element is at the beginning of the structure. After that...alignment and padding can be anything.

Seems the C++03 standard didn't say (or I didn't find) whether the alignment padding bytes should be included in the object representation.
And the C99 standard says the "sizeof" a struct type or union type includes internal and trailing padding, but I'm not sure if all alignment padding is included in that "trailing padding".
Now back to your example. There is really no confusion. sizeof(example) == 8 means the structure does take 8 bytes to represent itself, including the tailing 3 padding bytes. If the char in the second structure has an offset of 6, it will overwrite the space used by m_Example. The layout of a certain type is implementation-defined, and should be kept stable in the whole implementation.
Still, whether p+1 equals (T*)((char*)p + sizeof(T)) is unsure. And I'm hoping to find the answer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js