Comparison semantics with std::atomic types

Comparison semantics with std::atomic types - c++

I'm trying to find where the comparison semantics for the type T with std::atomic is defined.
I know that beside the builtin specializations for integral types, T can be any TriviallyCopyable type. But how do operations like compare_and_exchange_X know how to compare an instance of T?
I imagine they must simply do a byte by byte comparison of the user defined object (like a memcmp) but I don't see where in the standard this is explicitly mentioned.
So, suppose I have:
struct foo
{
std::uint64_t x;
std::uint64_t y;
};
How does the compiler know how to compare two std::atomic<foo> instances when I call std::atomic<foo>::compare_and_exchange_weak()?

In draft n3936, memcmp semantics are explicitly described in section 29.6.5.
Note: For example, the effect of atomic_compare_exchange_strong is
if (memcmp(object, expected, sizeof(*object)) == 0)
memcpy(object, &desired, sizeof(*object));
else
memcpy(expected, object, sizeof(*object));
and
Note: The memcpy and memcmp semantics of the compare-and-exchange operations may result in failed comparisons for values that compare equal with operator== if the underlying type has padding bits, trap bits, or alternate representations of the same value.
That wording has been present at least since n3485.
Note that only memcmp(p1, p2, sizeof(T)) != 0 is meaningful to compare_and_exchange_weak (failure guaranteed). memcmp(p1, p2, sizeof(T)) == 0 allows but does not guarantee success.

It's implementation defined. It could just be using a mutex lock or it could be using some intrinsics on memory blobs. The standard simply defines it such that the latter might work as an implementation strategy.
The compiler doesn't know anything here. It'll all be in the library. Since it's a template you can go read how your implementation does it.

Related

C++ standard for member offsets of standard layout struct

Does the C++11 standard guarantee that all compilers will choose the same memory offsets for all members in a given standard layout struct, assuming all members have guaranteed sizes (e.g. int32_t instead of int)?
That is, for a given member in a standard layout struct, does C++11 guarantee that offsetof will give the same value across all compilers?
If so, is there any specification of what that value would be, e.g. as a function of size, alignment, and order of the struct members?

There is no guarantee that offsetof will yield the same values across compilers.
There are guarantees about minimum sizes of types (e.g., char >= 8 bits, short, int >= 16 bits, long >= 32 bits, long long >= 64 bits), and the relationship between sizes1 (sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)).
For most types, it's guaranteed that the alignment requirement is no greater than the size of the type.
For any struct/class, the first (non-static2) element must be at the beginning of the class/struct, and in the absence of changes to the visibility, order is guaranteed to be in the order of definition. For example:
struct { // same if you use `class`
int a;
int b;
};
Since these are both public, a and b must be in that order. But:
struct {
int a;
int b;
private:
int c;
};
The first element (a) is required to be at the beginning of the struct, but because of the change from public to private, the compiler is (theoretically) allowed to arrange c before b.
This rule has changed over time though. In C++98, even a vacuous visibility specifier allowed rearrangement of members.
struct A {
int a;
int b;
public:
int c;
};
The public allows rearranging b and c even though they're both public. Since then it's been tightened up so it's only elements with differing visibility, and in C++ 23 the whole idea of rearranging elements based on visibility is gone (and long past time, in my opinion--I don't think anybody ever used it, so it's always been a rule you sort of needed to know, but did nobody any real good).
If you want to get really technical, the requirement isn't really on the size, but on the range, so in theory the relationship between sizes isn't quite guaranteed, but for for most practical purposes, it is.
A static element isn't normally allocated as part of the class/struct object at all. A static member is basically allocated as a global variable, but with some extra rules about visibility of its name.

No, there are no such guarantees. The C++ standard explicitly provides for type-specific padding and alignment requirements, for one thing, and that automatically dissolves this kind of guarantee.
It might be reasonable to anticipate uniform padding and alignment requirements for a specific hardware platform, that all compilers on that platform will implement, but that again is not guaranteed.

Absolutely not. Memory layout is completely up to the C++ implementation. The only exception, only for standard-layout classes, is that the first non-static data member or base class subobject(s) have zero offset. There are also some other constraints, e.g. due to sizes and alignment of subobjects and constraints on ordering of addresses of subobjects, but nothing that determines concrete offsets of subobjects.
However, typically compilers follow some ABI specification on any given architecture/platform, so that compilers for the same architecture/platform will likely use the same ABI and same memory layout (e.g. the SysV x86-64 ABI together with the Itanium C++ ABI on Linux x86-64 at least for both GCC and Clang).

reinterpret_cast usage to manipulate bytes

I was reading here how to use the byteswap function. I don't understand why bit_cast is actually needed instead of using reinterpret_cast to char*. What I understand is that using this cast we are not violating the strict aliasing rule. I read that the second version below could be wrong because we access to unaligned memory. It could but at this point I'm a bit confused because if the access is UB due to unaligned memory, when is it possible to manipulate bytes with reinterpret_cast? According to the standard the cast should allow to access (read/write) the memory.
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
static_assert(std::has_unique_object_representations_v<T>,
"T may not have padding bits");
auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
std::ranges::reverse(value_representation);
return std::bit_cast<T>(value_representation);
}
template<std::integral T>
void byteswap(T& value) noexcept
{
static_assert(std::has_unique_object_representations_v<T>,
"T may not have padding bits");
char* value_representation = reinterpret_cast<char*>(value);
std::reverse(value_representation, value_representation+sizeof(T));
}

The primary reason is that reinterpret_cast can not be used in constant expression evaluation, while std::bit_cast can. And std::byteswap is specified to be constexpr.
If you added constexpr to the declaration in your implementation, it would be ill-formed, no diagnostic required, because there is no specialization of it that could be called as subexpression of a constant expression.
Without the constexpr it is not ill-formed, but cannot be called as subexpression of a constant expression, which std::byteswap is supposed to allow.
Furthermore, there is a defect in the standard:
The standard technically does not allow doing pointer arithmetic on the reinterpret_cast<char*>(value) pointer (and also doesn't really specify a meaning for reading and writing through such a pointer).
The intention is that the char* pointer should be a pointer into the object representation of the object, considered as an array of characters. But currently the standard just says that the reinterpret_cast<char*>(value) pointer still points to the original object, not its object representation. See P1839 for a paper proposing to correct the specification to be more in line with the usual assumptions.
The implementation from cppreference is also making an assumption that might not be guaranteed to be true: Whether std::array<std::byte, sizeof(T)> is guaranteed to have the same size as T. Of course that should hold in practice and std::bit_cast will fail to compile if it doesn't.
If you want to read some discussion on whether or not it is guaranteed in theory, see the questions std::bit_cast with std::array, Is the size of std::array defined by standard and What is the sizeof std::array<char, N>?

Do `uintptr_t` types have a defined total ordering?

Pointers in C++ do not have a defined total ordering unless they fall within a narrow set of criteria, such as being all parts of the same subobject or array (expr.rel/4, defns.order.ptr).
In fact, to even provide a basic ordering guarantee, you can't use operator< but instead must use std::less with pointers from different subobjects -- otherwise the result is not well defined behavior.
From comparisons.general/2:
For templates less, greater, less_equal, and greater_equal, the specializations for any pointer type yield a result consistent with the implementation-defined strict total order over pointers (defns.order.ptr).
However, in C++ we also have the (optional) type std::uintptr_t -- which is defined to be an unsigned integer value large enough to store a pointer, with the property that its capable of surviving a round-trip from void* to std::uintptr_t and back to void* without any loss-of-data. unsigned integer types also have a well-defined total-ordering -- which leads me to my question.
Is it at all reasonable to expect that a std::uintptr_t from different sources have a well-defined ordering, within the definition of the C++ Abstract Machine?
I'm not asking whether this would work in practice, but whether this is even feasible to assume that this is well-defined behavior (my assumption is that it is not)
As a concrete example of a potential application of this, I am interested whether something like the following is formally well-defined behavior:
template <typename T>
auto my_typeid() -> std::uintptr_t
{
// Each 'char' has a unique address since it's static and part of
// each unique template instantiation
static const char s_data = 0;
// Use this address for an ordering system, and for identity
return reinterpret_cast<std::uintptr_t>(&s_data);
}

std::bit_cast with std::array

In his recent talk “Type punning in modern C++” Timur Doumler said that std::bit_cast cannot be used to bit cast a float into an unsigned char[4] because C-style arrays cannot be returned from a function. We should either use std::memcpy or wait until C++23 (or later) when something like reinterpret_cast<unsigned char*>(&f)[i] will become well defined.
In C++20, can we use an std::array with std::bit_cast,
float f = /* some value */;
auto bits = std::bit_cast<std::array<unsigned char, sizeof(float)>>(f);
instead of a C-style array to get bytes of a float?

Yes, this works on all major compilers, and as far as I can tell from looking at the standard, it is portable and guaranteed to work.
First of all, std::array<unsigned char, sizeof(float)> is guaranteed to be an aggregate (https://eel.is/c++draft/array#overview-2). From this follows that it holds exactly a sizeof(float) number of chars inside (typically as a char[], although afaics the standard doesn't mandate this particular implementation - but it does say the elements must be contiguous) and cannot have any additional non-static members.
It is therefore trivially copyable, and its size matches that of float as well.
Those two properties allow you to bit_cast between them.

The accepted answer is incorrect because it fails to consider alignment and padding issues.
Per [array]/1-3:
The header <array> defines a class template for storing fixed-size
sequences of objects. An array is a contiguous container. An instance
of array<T, N> stores N elements of type T, so that size() == N is an invariant.
An array is an aggregate that can be list-initialized with up to N
elements whose types are convertible to T.
An array meets all of the requirements of a container and of a
reversible container ([container.requirements]), except that a default
constructed array object is not empty and that swap does not have
constant complexity. An array meets some of the requirements of a
sequence container. Descriptions are provided here only for operations
on array that are not described in one of these tables and for
operations where there is additional semantic information.
The standard does not actually require std::array to have exactly one public data member of type T[N], so in theory it is possible that sizeof(To) != sizeof(From) or is_trivially_copyable_v<To>.
I will be surprised if this doesn't work in practice, though.

Yes.
According to the paper that describes the behaviour of std::bit_cast, and its proposed implementation as far as both types have the same size and are trivially copyable the cast should be successful.
A simplified implementation of std::bit_cast should be something like:
template <class Dest, class Source>
inline Dest bit_cast(Source const &source) {
static_assert(sizeof(Dest) == sizeof(Source));
static_assert(std::is_trivially_copyable<Dest>::value);
static_assert(std::is_trivially_copyable<Source>::value);
Dest dest;
std::memcpy(&dest, &source, sizeof(dest));
return dest;
}
Since a float (4 bytes) and an array of unsigned char with size_of(float) respect all those asserts, the underlying std::memcpy will be carried out. Therefore, each element in the resulting array will be one consecutive byte of the float.
In order to prove this behaviour, I wrote a small example in Compiler Explorer that you can try here: https://godbolt.org/z/4G21zS. The float 5.0 is properly stored as an array of bytes (Ox40a00000) that corresponds to the hexadecimal representation of that float number in Big Endian.

ARM Neon: How to convert from uint8x16_t to uint8x8x2_t?

I recently discovered about the vreinterpret{q}_dsttype_srctype casting operator. However this doesn't seem to support conversion in the data type described at this link (bottom of the page):
Some intrinsics use an array of vector types of the form:
<type><size>x<number of lanes>x<length of array>_t
These types are treated as ordinary C structures containing a single
element named val.
An example structure definition is:
struct int16x4x2_t
{
int16x4_t val[2];
};
Do you know how to convert from uint8x16_t to uint8x8x2_t?
Note that that the problem cannot be reliably addressed using union (reading from inactive members leads to undefined behaviour Edit: That's only the case for C++, while it turns out that C allows type punning), nor by using pointers to cast (breaks the strict aliasing rule).

It's completely legal in C++ to type pun via pointer casting, as long as you're only doing it to char*. This, not coincidentally, is what memcpy is defined as working on (technically unsigned char* which is good enough).
Kindly observe the following passage:
For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes (1.7) making up the object can be copied into
an array of char or unsigned char.
42 If the content of the array of char or unsigned char is copied back
into the object, the object shall subsequently hold its original
value. [Example:
#define N sizeof(T)
char buf[N];
T obj;
// obj initialized to its original value
std::memcpy(buf, &obj, N);
// between these two calls to std::memcpy,
// obj might be modified
std::memcpy(&obj, buf, N);
// at this point, each subobject of obj of scalar type
// holds its original value
— end example ]
Put simply, copying like this is the intended function of std::memcpy. As long as the types you're dealing with meet the necessary triviality requirements, it's totally legit.
Strict aliasing does not include char* or unsigned char*- you are free to alias any type with these.
Note that for unsigned ints specifically, you have some very explicit leeway here. The C++ Standard requires that they meet the requirements of the C Standard. The C Standard mandates the format. The only way that trap representations or anything like that can be involved is if your implementation has any padding bits, but ARM does not have any- 8bit bytes, 8bit and 16bit integers. So for unsigned integers on implementations with zero padding bits, any byte is a valid unsigned integer.
For unsigned integer types other than unsigned char, the bits
of the object representation shall be divided into two groups:
value bits and padding bits (there need not be any of the
latter). If there are N value bits, each bit shall represent
a different power of 2 between 1 and 2N−1, so that objects
of that type shall be capable of representing values from 0
to 2N−1 using a pure binary representation; this shall be
known as the value representation. The values of any padding bits are
unspecified.

Based on your comments, it seems you want to perform a bona fide conversion -- that is, to produce a distinct, new, separate value of a different type. This is a very different thing than a reinterpretation, such as the lead-in to your question suggests you wanted. In particular, you posit variables declared like this:
uint8x16_t a;
uint8x8x2_t b;
// code to set the value of a ...
and you want to know how to set the value of b so that it is in some sense equivalent to the value of a.
Speaking to the C language:
The strict aliasing rule (C2011 6.5/7) says,
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
a type compatible with the effective type of the object, [...]
an aggregate or union type that includes one of the aforementioned types among its members [...], or
a character type.
(Emphasis added. Other enumerated options involve differently-qualified and differently-signed versions of the of the effective type of the object or compatible types; these are not relevant here.)
Note that these provisions never interfere with accessing a's value, including the member value, via variable a, and similarly for b. But don't overlook overlook the usage of the term "effective type" -- this is where things can get bolluxed up under slightly different circumstances. More on that later.
Using a union
C certainly permits you to perform a conversion via an intermediate union, or you could rely on b being a union member in the first place so as to remove the "intermediate" part:
union {
uint8x16_t x1;
uint8x8_2_t x2;
} temp;
temp.x1 = a;
b = temp.x2;
Using a typecast pointer (to produce UB)
However, although it's not so uncommon to see it, C does not permit you to type-pun via a pointer:
// UNDEFINED BEHAVIOR - strict-aliasing violation
b = *(uint8x8x2_t *)&a;
// DON'T DO THAT
There, you are accessing the value of a, whose effective type is uint8x16_t, via an lvalue of type uint8x8x2_t. Note that it is not the cast that is forbidden, nor even, I'd argue, the dereferencing -- it is reading the dereferenced value so as to apply the side effect of the = operator.
Using memcpy()
Now, what about memcpy()? This is where it gets interesting. C permits the stored values of a and b to be accessed via lvalues of character type, and although its arguments are declared to have type void *, this is the only plausible interpretation of how memcpy() works. Certainly its description characterizes it as copying characters. There is therefore nothing wrong with performing a
memcpy(&b, &a, sizeof a);
Having done so, you may freely access the value of b via variable b, as already mentioned. There are aspects of doing so that could be problematic in a more general context, but there's no UB here.
However, contrast this with the superficially similar situation in which you want to put the converted value into dynamically-allocated space:
uint8x8x2_t *c = malloc(sizeof(*c));
memcpy(c, &a, sizeof a);
What could be wrong with that? Nothing is wrong with it, as far as it goes, but here you have UB if you afterward you try to access the value of *c. Why? because the memory to which c points does not have a declared type, therefore its effective type is the effective type of whatever was last stored in it (if that has an effective type), including if that value was copied into it via memcpy() (C2011 6.5/6). As a result, the object to which c points has effective type uint8x16_t after the copy, whereas the expression *c has type uint8x8x2_t; the strict aliasing rule says that accessing that object via that lvalue produces UB.

So there are a bunch of gotchas here. This reflects C++.
First you can convert trivially copyable data to char* or unsigned char* or c++17 std::byte*, then copy it from one location to another. The result is defined behavior. The values of the bytes are unspecified.
If you do this from a value of one one type to another via something like memcpy, this can result in undefined behaviour upon access of the target type unless the target type has valid values for all byte representations, or if the layout of the two types is specified by your compiler.
There is the possibility of "trap representations" in the target type -- byte combinations that result in machine exceptions or something similar if interpreted as a value of that type. Imagine a system that doesn't use IEEE floats and where doing math on NaN or INF or the like causes a segfault.
There are also alignment concerns.
In C, I believe that type punning via unions is legal, with similar qualifications.
Finally, note that under a strict reading of the c++ standard, foo* pf = (foo*)malloc(sizeof(foo)); is not a pointer to a foo even if foo was plain old data. You must create an object before interacting with it, and the only way to create an object outside of automatic storage is via new or placement new. This means you must have data of the target type before you memcpy into it.

Do you know how to convert from uint8x16_t to uint8x8x2_t?
uint8x16_t input = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
uint8x8x2_t output = { vget_low_u8(input), vget_high_u8(input) };
One must understand that with neon intrinsics, uint8x16_t represents a 16-byte register; while uint8x8x2_t represents two adjacent 8-byte registers. For ARMv7 these may be the same thing (q0 == {d0, d1}) but for ARMv8 the register layout is different. It's necessary to get (extract) the low 8 bytes and the high 8 bytes of the single 16-byte register using two functions. The clang compiler will determine which instruction(s) are necessary based on the context.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js