Existence of objects created in C functions - c++

It has been established (see below) placement new is required to create objects
int* p = (int*)malloc(sizeof(int));
*p = 42; // illegal, there isn't an int
Yet that is a pretty standard way of creating objects in C.
The question is, does the int exist if it is created in C, and returned to C++?
In other words, is the following guaranteed to be legal? Assume int is the same for C and C++.
foo.h
#ifdef __cplusplus
extern "C" {
#endif
int* foo(void);
#ifdef __cplusplus
}
#endif
foo.c
#include "foo.h"
#include <stdlib.h>
int* foo(void) {
return malloc(sizeof(int));
}
main.cpp
#include "foo.h"
#include<cstdlib>
int main() {
int* p = foo();
*p = 42;
std::free(p);
}
Links to discussions on mandatory nature of placement new:
Is placement new legally required for putting an int into a char array?
https://stackoverflow.com/a/46841038/4832499
https://groups.google.com/a/isocpp.org/forum/#!msg/std-discussion/rt2ivJnc4hg/Lr541AYgCQAJ
https://www.reddit.com/r/cpp/comments/5fk3wn/undefined_behavior_with_reinterpret_cast/dal28n0/
reinterpret_cast creating a trivially default-constructible object

Yes! But only because int is a fundamental type. Its initialization is vacuous operation:
[dcl.init]/7:
To default-initialize an object of type T means:
If T is a (possibly cv-qualified) class type, constructors are considered. The applicable constructors are enumerated
([over.match.ctor]), and the best one for the initializer () is chosen
through overload resolution. The constructor thus selected is called,
with an empty argument list, to initialize the object.
If T is an array type, each element is default-initialized.
Otherwise, no initialization is performed.
Emphasis mine. Since "not initializing" an int is akin to default initialing it, it's lifetime begins once storage is allocated:
[basic.life]/1:
The lifetime of an object or reference is a runtime property of the
object or reference. An object is said to have non-vacuous
initialization if it is of a class or aggregate type and it or one of
its subobjects is initialized by a constructor other than a trivial
default constructor. The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-vacuous initialization, its initialization is complete,
Allocation of storage can be done in any way acceptable by the C++ standard. Yes, even just calling malloc. Compiling C code with a C++ compiler would be a very bad idea otherwise. And yet, the C++ FAQ has been suggesting it for years.
In addition, since the C++ standard defers to the C standard where malloc is concerned. I think that wording should be brought forth as well. And here it is:
7.22.3.4 The malloc function - Paragraph 2:
The malloc function allocates space for an object whose size is
specified by size and whose value is indeterminate.
The "value is indeterminate" part kinda indicates there's an object there. Otherwise, how could it have any value, let alone an indeterminate one?

I think the question is badly posed. In C++ we only have the concepts of translation units and linkage, the latter simply meaning under which circumstances names declared in different TUs refer to the same entity or not.
Nothing is virtually said about the linking process as such, the correctness of which must be guaranteed by the compiler/linker anyway; even if the code snippets above were purely C++ sources (with malloc replaced with a nice new int) the result would be still implementation defined ( eg. consider object files compiled with incompatible compiler options/ABIs/runtimes ).
So, either we talk in full generality and conclude that any program made of more than one TU is potentially wrong or we must take for granted that the linking process is 'valid' ( only the implementation knows ) and hence take for granted that if a function from some source language ( in this case C ) primises to return a 'pointer to an existing int' then the the same function in the destination language (C++) must still be a 'pointer to an existing int' (otherwise, following [dcl.link], we could't say that the linkage has been 'achieved', returning to the no man's land).
So, in my opinion, the real problem is assessing what an 'existing' int is in C and C++, comparatively. As I read the correponding standards, in both languages an int lifetime basically begins when its storage is reserved for it: in the OP case of an allocated(in C)/dynamic(in c++) storage duration object, this occurs (on C side) when the effective type of the lvalue *pointer_to_int becomes int (eg. when it's assigned a value; until then, the not-yet-an-int may trap(*)).
This does not happen in the OP case, the malloc result has no effective type yet. So, that int does not exist neither in C nor in C++, it's just a reinterpreted pointer.
That said, the c++ part of the OP code assigns just after returning from foo(); if this was intended, then we could say that given that malloc() in C++ is required having C semantics, a placement new on the c++ side would suffice to make it valid (as the provided links show).
So, summarizing, either the C code should be fixed to return a pointer to an existing int (by assigning to it) or the c++ code should be fixed by adding placement new. (sorry for the lengthy arguing ... :))
(*) here I'm not claiming that the only issue is the existence of trap representation; if it were, one could argue that the result of foo() is an indeterminate value on C++ side, hence something that you can safely assign to. Clearly this is not the case because there are also aliasing rules to take into account ...

I can identify two parts of this question that should be addressed separately.
Object lifetime
It has been established (see below) placement new is required to create objects
I posit that this area of the standard contains ambiguity, omission, contradiction, and/or gratuitous incompatibility with existing practice, and should therefore be considered broken.
The only people who should be interested in what a broken part of the standard actually says are the people responsible for fixing the breakage. Other people (language users and language implementors alike) should defer to existing practice and common sense. Both of which say that one does not need new to create an int, malloc is enough.
This document identifies the problem and proposes a fix (thanks #T.C. for the link)
C compatibility
Assume int is the same for C and C++
It is not enough to assume that.
One also needs to assume that int* is the same, that the same memory is accessible by C and C++ functions linked together in a program, and that the C++ implementation does not define the semantics of calls to functions written in the C programming language to be wiping your hard drive and stealing your girlfriend. In other words, that C and C++ implementations are compatible enough.
None of this is stipulated by the standard or should be assumed.
Indeed, there are C implementations that are incompatible with each other, so they cannot be both compatible with the same C++ implementation.
The only thing the standard says is "Every implementation shall provide for linkage to functions written in the C programming language" (dcl.link) What is the semantics of such linkage is left undefined.
Here, as before, the best course of action is to defer to existing practice and common sense. Both of which say that a C++ implementation usually comes bundled with a compatible enough C implementation, with the linkage working as one would expect.

The question is meaningless. Sorry. This is the only "lawyer" answer possible.
It is meaningless because the C++ and the C language ignore each others, as they ignore anything else.
Nothing in either language is described in term of low level implementation (which is ridiculous for languages often described as "high level assembly"). Both C and C++ are specified (if you can call that a specification) at a very abstract level, and the high and low levels are never reconnected. This generates endless debates about what undefined behaviors means in practice, how unions work, etc.

Although neither the C Standard nor, so far as I know, the C++ Standard officially recognizes the concept, almost any platform which allows programs produce by different compilers to be linked together will support opaque functions.
When processing a call to an opaque function, a compiler will start by
ensuring that the value of all objects that might legitimately be examined
by outside code is written to the storage associated with those objects.
Once that is done, it will place the function's arguments in places specified
by the platform's documentation (the ABI, or Application Binary Interface)
and perform the call.
Once the function returns, the compiler will assume that any objects which
an outside function could have written, may have been written, and will thus
reload any such values from the storage associated with those objects the
next time they are used.
If the storage associated with an object holds a particular bit pattern when
an opaque function returns, and if the object would hold that bit pattern
when it has a defined value, then a compiler must behave as though the object
has that defined value without regard for how it came to hold that bit
pattern.
The concept of opaque functions is very useful, and I see no reason that the C and C++ Standards shouldn't recognize it, nor provide a standard "do nothing" opaque function. To be sure, needlessly calling opaque functions will greatly impede what might otherwise be useful optimizations, but being able to force a compiler to treat actions as opaque function calls when needed may make it possible to enable more optimizations elsewhere.
Unfortunately, things seem to be going in the opposite direction, with build systems increasingly trying to apply "whole program" optimization. WPO would be good if there were a way of distinguishing between function calls that were opaque because the full "optimization barrier" was needed, from those which had been treated as opaque simply because there was no way for optimizers to "see" across inter-module boundaries. Unless or until proper barriers are added, I don't know any way to ensure that optimizers won't get "clever" in ways that break code which would have had defined behavior with the barriers in place.

I believe it is legal now, and retroactively since C++98!
Indeed the C++ specification wording till C++20 was defining an object as (e.g. C++17 wording, [intro.object]):
The constructs in a C++ program create, destroy, refer to, access, and
manipulate objects. An object is created by a definition (6.1), by a
new-expression (8.5.2.4), when implicitly changing the active member
of a union (12.3), or when a temporary object is created (7.4, 15.2).
The possibility of creating an object using malloc allocation was not mentioned. Making it a de-facto undefined behavior.
It was then viewed as a problem, and this issue was addressed later by https://wg21.link/P0593R6 and accepted as a DR against all C++ versions since C++98 inclusive, then added into the C++20 spec, with the new wording.
The wording of the standard are quite vague and may even seem to use tautology, defining a well defined implicitly-created objects (6.7.2.11 Object model [intro.object]) as:
implicitly-created objects whose address is the address of the start
of the region of storage, and produce a pointer value that points to
that object, if that value would result in the program having defined
behavior [...]
The example given in C++20 spec is:
#include <cstdlib>
struct X { int a, b; };
X *make_x() {
// The call to std​::​malloc implicitly creates an object of type X
// and its subobjects a and b, and returns a pointer to that X object
// (or an object that is pointer-interconvertible ([basic.compound]) with it),
// in order to give the subsequent class member access operations
// defined behavior.
X *p = (X*)std::malloc(sizeof(struct X));
p->a = 1;
p->b = 2;
return p;
}
It seems that the `object` created in a C-function as in the OP question, falls into this category and is a valid object. Which would be the case also for allocation of C-structs with malloc.

No, the int does not exist, as explained in the linked Q/As. An important standard quote reads like this in C++14:
1.8 The C ++ object model [intro.object]
[...] An object is created by a definition (3.1), by a new-expression (5.3.4) or by the
implementation (12.2) when needed. [...]
(12.2 is a paragraph about temporary objects)
The C++ standard has no rules for interfacing C and C++ code. A C++ compiler can only analyze objects created by C++ code, but not some bits passed to it form an external source like a C program, or a network interface, etc.
Many rules are tailored to make optimizations possible. Some of them are only possible if the compiler does not have to assume uninitialized memory contains valid objects. For example, the rule that one may not read an uninitialized int would not make sense otherwise, because if ints may exist anywhere, why would it be illegal to read an indeterminate int value?
This would be a standard compliant way to write the program:
int main() {
void* p = foo();
int i = 42;
memcpy(p, &i, sizeof(int));
//std::free(p); //this works only if C and C++ use the same heap.
}

Related

How does binary I/O of POD types not break the aliasing rules?

Twenty plus years ago, I would have (and didn't) think anything of doing binary I/O with POD structs:
struct S { std::uint32_t x; std::uint16_t y; };
S s;
read(fd, &s, sizeof(s)); // assume this succeeds and reads sizeof(s) bytes
std::cout << s.x + s.y;
(I'm ignoring padding and byte order issues, because they're not part of what I am asking about.)
"Obviously", we can read into s and the compiler is required to assume that the contents of s.x and s.y are aliases by read(). So, s.x after the read() isn't undefined behaviour (because s was uninitialized).
Likewise in the case of
S s = { 1, 2 };
read(fd, &s, sizeof(s)); // assume this succeeds and reads sizeof(s) bytes
std::cout << s.x + s.y;
the compiler can't presume that s.x is still 1 after the read().
Fast forward to the modern world, where we actually have to follow the aliasing rules and avoid undefined behaviour, and so on, and I have been unable to prove to myself that this is allowed.
In C++14, for example, [basic.types] ¶2 says:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array
of char or unsigned char.
42 If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
¶4 says:
The object representation of an object of type T is the sequence of N unsigned char objects taken up by
the object of type T, where N equals sizeof(T).
[basic.lval] ¶10 says:
If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:54
...
— a char or unsigned char type.
54 The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Taken together, I think that this is the standard saying that "you can form an unsigned char or char pointer to any trivially copyable (and thus POD) type and read or write its bytes". In fact, in N2342, which gave us the modern wording, the introductory table says:
Programs can safely apply coding optimizations, particularly std::memcpy.
and later:
Yet the only data member in the class is an array of char, so programmers intuitively expect the class to be memcpyable and binary I/O-able.
With the proposed resolution, the class can be made into a POD by making the default constructor trivial (with N2210 the syntax would be endian()=default), resolving all the issues.
It really sounds like N2342 is trying to say "we need to update the wording to make it so you can do I/O like read() and write() for these types", and it really seems like the updated wording was made standard.
Also, I often hear reference to "the std::memcpy() hole" or similar where you can use std::memcpy() to basically "allow aliasing". But the standard doesn't seem to call out std::memcpy() specifically (and in fact in one footnote mentions it along with std::memmove() and calls it an "example" of a way to do this).
Plus, there's the fact that I/O functions like read() tend to be OS-specific from POSIX and thus aren't discussed in the standard.
So, with all this in mind, my questions are:
What actually guarantees that we can do real-world I/O of POD structs (as shown above)?
Do we actually need to need to std::memcpy() the content into and out of unsigned char buffers (surely not) or can we directly read into the POD types?
Do the OS I/O functions "promise" that they manipulate the underlying memory "as if by reading or writing unsigned char values" or "as if by std::memcpy()"?
What concerns should I have when there are layers (such as Asio) between me and the raw I/O functions?
Strict aliasing is about accessing an object through a pointer/reference to a type other than that object's actual type. However, the rules of strict aliasing permit accessing any object of any type through a pointer to an array of bytes. And this rule has been around for at least since C++14.
Now, that doesn't mean much, since something has to define what such an access means. For that (in terms of writing), we only really have two rules: [basic.types]/2 and /3, which cover copying the bytes of Trivially Copyable types. The question ultimately boils down to this:
Are you reading the "the underlying bytes making up [an] object" from the file?
If the data you're reading into your s was in fact copied from the bytes of a live instance of S, then you're 100% fine. It's clear from the standard that performing fwrite writes the given bytes to a file, and performing fread reads those bytes from the file. Therefore, if you write the bytes of an existing S instance to a file, and read those written bytes to an existing S, you have perform the equivalent of copying those bytes.
Where you run into technical issues is when you start getting into the weeds of interpretation. It is reasonable to interpret the standard as defining the behavior of such a program even when the writing and the reading happen in different invocations of the same program.
Concerns arise in one of two cases:
1: When the program which wrote the data is actually a different program than the one who read it.
2: When the program which wrote the data did not actually write an object of type S, but instead wrote bytes that just so happen to be legitimately interpret-able as an S.
The standard doesn't govern interoperability between two programs. However, C++20 does provide a tool that effectively says "if the bytes in this memory contain a legitimate object representation of a T, then I'll return a copy of what that object would look like." It's called std::bit_cast; you can pass it an array of bytes of sizeof(T), and it'll return a copy of that T.
And you get undefined behavior if you're a liar. And bit_cast doesn't even compile if T is not trivially copyable.
However, to do a byte copy directly into a live S from a source that wasn't technically an S but totally could be an S, is a different matter. There isn't wording in the standard to make that work.
Our friend P0593 proposes a mechanism for explicitly declaring such an assumption, but it didn't quite make it into C++20.
The type-access rules in every version of the C and C++ Standard to date are based upon the C89 rules, which were written with the presumption that implementations intended for various tasks would uphold the Spirit of C principle described in the published Rationale as "Don't prevent [or otherwise interfere with] the programmer from doing what needs to be done [to accomplish those tasks]." The authors of C89 would have seen no reason to worry about whether or not the rules as written actually required that compilers support constructs that everyone would agree that they should (e.g. allocating storage via malloc, passing it to fread, and then using it as a standard layout structure type) since they would expect such constructs to be supported on any compiler whose customers would need them, without regard for whether or not the rules as written actually required such support.
There are many situations where constructs which should "obviously" work, actually invoke UB, because e.g. the authors of the Standard saw no need to worry about whether the rules would e.g. forbid a compiler given the code:
struct S {int dat[10]; } x,y;
void test(int i)
{
y = x;
y.dat[i] = 1; /// Equivalent to *(y.dat+i) = 1;
x = y;
}
from assuming that object y of type struct S could not possibly be accessed by the dereferenced int* on the marked line(*), and thus need not be copied back to object x. For a compiler to make such an assumption when it can see that the pointer is derived from a struct S would have been universally recognized as obtuse regardless of whether or not the Standard would forbid it, but the question of exactly when a compiler should be expected "see" how a pointer was produced was a Quality of Implementation issue outside the Standard's jurisdiction.
(*) In fact, the rules as written would allow a compiler to make such an assumption, since the only types of lvalue that may be used to access a struct S would be that structure type, qualified versions of it, types derived from it, or character types.
It's sufficiently obvious that functions like fread() should be usable on standard-layout structures that quality compilers will generally support such usage without regard for whether the Standard would actually require them to do so. Moving such questions from Quality of Implementation issues to actual conformance issues would require adopting new terminology to describe what a statement like int *p = x.dat+3; does with the stored value of x [it should cause it to be accessible via p under at least some circumstances], and more importantly would require that the Standard itself affirm a point which is currently relegated to the published Rationale--that it is not intended to say anything bad about code which will only run on implementations that are suitable for its purpose, nor to say anything good about implementations which, although conforming, aren't suitable for their claimed purposes.

What is the purpose of std::launder?

P0137 introduces the function template std::launder and makes many, many changes to the standard in the sections concerning unions, lifetime, and pointers.
What is the problem this paper is solving? What are the changes to the language that I have to be aware of? And what are we laundering?
std::launder is aptly named, though only if you know what it's for. It performs memory laundering.
Consider the example in the paper:
struct X { const int n; };
union U { X x; float f; };
...
U u = {{ 1 }};
That statement performs aggregate initialization, initializing the first member of U with {1}.
Because n is a const variable, the compiler is free to assume that u.x.n shall always be 1.
So what happens if we do this:
X *p = new (&u.x) X {2};
Because X is trivial, we need not destroy the old object before creating a new one in its place, so this is perfectly legal code. The new object will have its n member be 2.
So tell me... what will u.x.n return?
The obvious answer will be 2. But that's wrong, because the compiler is allowed to assume that a truly const variable (not merely a const&, but an object variable declared const) will never change. But we just changed it.
[basic.life]/8 spells out the circumstances when it is OK to access the newly created object through variables/pointers/references to the old one. And having a const member is one of the disqualifying factors.
So... how can we talk about u.x.n properly?
We have to launder our memory:
assert(*std::launder(&u.x.n) == 2); //Will be true.
Money laundering is used to prevent people from tracing where you got your money from. Memory laundering is used to prevent the compiler from tracing where you got your object from, thus forcing it to avoid any optimizations that may no longer apply.
Another of the disqualifying factors is if you change the type of the object. std::launder can help here too:
alignas(int) char data[sizeof(int)];
new(&data) int;
int *p = std::launder(reinterpret_cast<int*>(&data));
[basic.life]/8 tells us that, if you allocate a new object in the storage of the old one, you cannot access the new object through pointers to the old. launder allows us to side-step that.
std::launder is a mis-nomer. This function performs the opposite of laundering: It soils the pointed-to memory, to remove any expectation the compiler might have regarding the pointed-to value. It precludes any compiler optimizations based on such expectations.
Thus in #NicolBolas' answer, the compiler might be assuming that some memory holds some constant value; or is uninitialized. You're telling the compiler: "That place is (now) soiled, don't make that assumption".
If you're wondering why the compiler would always stick to its naive expectations in the first place, and would need to you to conspicuously soil things for it - you might want to read this discussion:
Why introduce `std::launder` rather than have the compiler take care of it?
... which led me to this view of what std::launder means.
I think there are two purposes of std::launder.
A barrier for constant folding/propagation, including devirtualization.
A barrier for fine-grained object-structure-based alias analysis.
Barrier for overaggressive constant folding/propagation (abandoned)
Historically, the C++ standard allowed compilers to assume that the value of a const-qualified or reference non-static data member obtained in some ways to be immutable, even if its containing object is non-const and may be reused by placement new.
In C++17/P0137R1, std::launder is introduced as a functionality that disables the aforementioned (mis-)optimization (CWG 1776), which is needed for std::optional. And as discussed in P0532R0, portable implementations of std::vector and std::deque may also need std::launder, even if they are C++98 components.
Fortunately, such (mis-)optimization is forbidden by RU007 (included in P1971R0 and C++20). AFAIK there's no compiler performing this (mis-)optimization.
Barrier for devirtualization
A virtual table pointer (vptr) can be considered constant during the lifetime of its containing polymorphic object, which is needed for devirtualization. Given that vptr is not non-static data member, compilers is still allowed to perform devirtualization based on the assumption that the vptr is not changed (i.e., either the object is still in its lifetime, or it is reused by a new object of the same dynamic type) in some cases.
For some unusual uses that replace a polymorphic object with a new object of different dynamic type (shown here), std::launder is needed as a barrier for devirtualization.
IIUC Clang implemented std::launder (__builtin_launder) with these semantics (LLVM-D40218).
Barrier for object-structure-based alias analysis
P0137R1 also changes the C++ object model by introducing pointer-interconvertibility. IIUC such change enables some "object-structure-based alias analysis" proposed in N4303.
As a result, P0137R1 makes the direct use of dereferencing a reinterpret_cast'd pointer from an unsigned char [N] array undefined, even if the array is providing storage for another object of correct type. And then std::launder is needed for access to the nested object.
This kind of alias analysis seems overaggressive and may break many useful code bases. AFAIK it's currently not implemented by any compiler.
Relation to type-based alias analysis/strict aliasing
IIUC std::launder and type-based alias analysis/strict aliasing are unrelated. std::launder requires that an living object of correct type to be at the provided address.
However, it seems that they are accidently made related in Clang (LLVM-D47607).

Does moving non-POD C++ objects with memcpy always invoke Undefined Behavior?

Specifically, I am interested in the case when:
It is known that there are no external pointers to the object (nor to any of its members).
The object contains no internal self-references.
The source object's destructor is guaranteed to not be invoked.
It would seem that under such circumstances objects should be memcpy-movable, even if they have user-defined constructors, destructors, or virtual functions. However, I am wondering if this is still considered UB, which overzealous compiler may take as an invitation to format my hard drive?
Edit: Please note that I am asking about destructive moving, not copying.
And yes, I am aware of is_trivially_copyable and others. However, is_trivially_copyable covers only a small fraction of C++ classes, whereas the situation described above is extremely common in practice.
Before C++11, yes, moving a non-POD type using memcpy() would invoke undefined behaviour.
Since C++11, the definitions have been tightened, so that is not necessarily true. The following is for C++11 or later.
POD is equivalent to being both "trivial" (which essentially means "can be statically initialised") and "standard-layout" (which means a number of things, including no virtual functions, having the same access control for all non-static members, having no members which are not standard-layout, no base classes of the same type as the first non-static member, and a few other properties).
It is the "trivially copyable" property which allows an object to be copied using memcpy(), as pointed out by Joseph Thomson in comments. A "trivial" type is trivially copyable, but the reverse is not true (e.g. a class might have a non-trivial default constructor - which makes it non-trivial - but still be trivially copyable). It is also possible for a type to be trivial but not standard-layout (which means it is not POD, as a POD type has both trivial AND standard-layout properties).
The trivial property can be tested using std::is_trivial<type> or (for copying) std::is_trivially_copyable<type>. The standard-layout property can be tested using std::is_standard_layout<type>. These are declared in standard header <type_traits>.
There is nothing undefined here. If there are virtual functions, then the vtable will get copied, too. Not a great idea, but if the types are the same it is will work.
The problem is that you need to know the details of everything in the class. Even if there are no pointers, maybe there is a unique id assigned by the constructor, or any of a thousand other things that can't just be copied. Using memcpy is like telling the compiler that you know exactly what you are doing. Make sure that's the case.
Edit: There is a big spread of possible interprtations between "not defined in the C++ standard" and "might format my hard drive with the compiler I'm using." Some clarification follows.
Classic Undefined Behavior
Here is an example of behavior that everyone would probably agree is undefined:
void do_something_undefined()
{
int i;
printf("%d",i);
}
Not Defined By C++ Standard
You can use a different, more strict definition of undefined. Take this code fragment:
struct MyStruct
{
int a;
int b;
MyStruct() : a(1),b(2)
{
}
~MyStruct()
{
std::cout << "Test: Deleting MyStruct" << std::endl;
}
};
void not_defined_by_standard()
{
MyStruct x,y;
x.a = 5;
memcpy(&y, &x, sizeof(MyStruct)); // or std::memcpy
}
Taking the previous posters at their word on the standard references, this use of memcpy is not defined by the C++ standard. Perhaps it is theoretically possible that a C++ standard could add a unique ID to each non-trivially destructed class, causing the destructors of x and y to fail. Even if this is permitted by the standard, you can certainly know, for your particular compiler, if it does or does not do this.
I would make semantic difference here and call this "not defined" instead of "undefined." one problem is the lawyer-like definition of terms: "Undefined Behavior" in the C++ standard means "not defined in the standard", not "gives an undefined result when using a particular compiler." While the standard may not define it, you can absolutely know if it is undefined with your particular compiler. (Note that cppreference of std::memcpy says "If the objects are not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined". This says memcpy is is unspecified behavior not undefined behavior, which is kind-of my whole point.)
So, again, you need to know exactly what you are doing. If you're writing portable code that needs to survive for years, don't do it.
Why does the C++ standard not like this code?
Simple: The memcpy call above effectively destructs and re-constructs y. It does this without calling the destructor. The C++ standard rightly does not like this at all.

Is it legal to compare dangling pointers?

Is it legal to compare dangling pointers?
int *p, *q;
{
int a;
p = &a;
}
{
int b;
q = &b;
}
std::cout << (p == q) << '\n';
Note how both p and q point to objects that have already vanished. Is this legal?
Introduction: The first issue is whether it is legal to use the value of p at all.
After a has been destroyed, p acquires what is known as an invalid pointer value. Quote from N4430 (for discussion of N4430's status see the "Note" below):
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of the deallocated storage become invalid pointer values.
The behaviour when an invalid pointer value is used is also covered in the same section of N4430 (and almost identical text appears in C++14 [basic.stc.dynamic.deallocation]/4):
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
[ Footnote: Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault. — end footnote ]
So you will need to consult your implementation's documentation to find out what should happen here (since C++14).
The term use in the above quotes means necessitating lvalue-to-rvalue conversion, as in C++14 [conv.lval/2]:
When an lvalue-to-rvalue conversion is applied to an expression e, and [...] the object to which the glvalue refers contains an invalid pointer value, the behaviour is implementation-defined.
History: In C++11 this said undefined rather than implementation-defined; it was changed by DR1438. See the edit history of this post for the full quotes.
Application to p == q: Supposing we have accepted in C++14+N4430 that the result of evaluating p and q is implementation-defined, and that the implementation does not define that a hardware trap occurs; [expr.eq]/2 says:
Two pointers compare equal if they are both null, both point to the same function, or both represent the same address (3.9.2), otherwise they compare unequal.
Since it's implementation-defined what values are obtained when p and q are evaluated, we can't say for sure what will happen here. But it must be either implementation-defined or unspecified.
g++ appears to exhibit unspecified behaviour in this case; depending on the -O switch I was able to have it say either 1 or 0, corresponding to whether or not the same memory address was re-used for b after a had been destroyed.
Note about N4430: This is a proposed defect resolution to C++14, that hasn't been accepted yet. It cleans up a lot of wording surrounding object lifetime, invalid pointers, subobjects, unions, and array bounds access.
In the C++14 text, it is defined under [basic.stc.dynamic.deallocation]/4 and subsequent paragraphs that an invalid pointer value arises when delete is used. However it's not clearly stated whether or not the same principle applies to static or automatic storage.
There is a definition "valid pointer" in [basic.compound]/3 but it is too vague to use sensibly.The [basic.life]/5 (footnote) refers to the same text to define the behaviour of pointers to objects of static storage duration, which suggests that it was meant to apply to all types of storage.
In N4430 the text is moved from that section up one level so that it does clearly apply to all storage durations. There is a note attached:
Drafting note: this should apply to all storage durations that can end, not just to dynamic storage duration. On an implementation supporting threads or segmented stacks, thread and automatic storage may behave in the same way that dynamic storage does.
My opinion: I don't see any consistent way to interpret the standard (pre-N4430) other than to say that p acquires an invalid pointer value. The behaviour doesn't seem to be covered by any other section besides what we have already looked at. So I am happy to treat the N4430 wording as representing the intent of the standard in this case.
Historically, there have been some systems where using a pointer as an rvalue might cause the system to fetch some information identified by some bits in that pointer. For example, if a pointer could contain the address of an object's header along with an offset into the object, fetching a pointer could cause the system to also fetch some information from that header. If the object has ceased to exist, the attempt to fetch information from its header could fail with arbitrary consequences.
That having been said, in the vast majority of C implementations, all pointers that were alive at some particular moment in time will forever hold the same relationships with regard to the relational and subtraction operators as they had at that particular time. Indeed, in most implementations if one has char *p, one may determine whether it identifies part of an object identified by char *base; size_t size; by checking whether (size_t)(p-base) < size; such comparison will work even retrospectively if there is any overlap in the objects' lifetime.
Unfortunately, the Standard defines no means by which code can indicate that it requires any of the latter guarantees, nor is there a standard means by which code can ask whether a particular implementation can promise any of the latter behaviors and refuse compilation if it does not. Further, some hyper-modern implementations will regard any use of relational or subtraction operators on two pointers as a promise by the programmer that the pointers in question will always identify the same live object, and omit any code which would only be relevant if that assumption didn't hold. Consequently, even though many hardware platforms would be able to offer guarantees that would be useful to many algorithms, there's no safe way by which code can exploit any such guarantees even if code will never need to run on hardware which does not naturally provide them.
The pointers contain the addresses of the variables they reference. The addresses are valid even when the variables that used to be stored there are released / destroyed / unavailable.
As long as you don't try to use the values at those addresses you are safe, meaning *p and *q will be undefined.
Obviously the result is implementation defined, therefore this code example can be used to study the features of your compiler if one doesn't want to dig into to assembly code.
Whether this is a meaningful practice is totally different discussion.

is reference in c++ internally compiled as pointers or alias?

This tutorial says,
You're probably noticing a similarity to pointers here--and that's true, references are often implemented by the compiler writers as pointers
In similar, one commented in
What is a reference variable in C++?
as
Technically not. If bar was a variable you could get its address. A reference is an alias to another variable (not the address of as this would imply the compiler would need to insert a dereference operation). When this gets compiled out bar probably is just replaced by foo
Which statement is true?
Both are true, but under different circumstances.
Semantically, a reference variable just introduces a new name for an object (in the C++ sense of "object").
(There's plenty of confusion around what "variable" and "object" mean, but I think that a "variable" in many other languages is called an "object" in C++, and that's what your second quote refers to as a "variable".)
If this reference isn't stored anywhere or passed as a parameter, it doesn't necessarily have any representation at all (the compiler can just use whatever it refers to instead).
If it is stored (e.g. as a member) or passed as a parameter, the compiler needs to give it a representation, and the most sensible one is to use the address of the object it refers to, which is exactly the same way as pointers are represented.
Note that the standard explicitly says that it it unspecified whether a reference variable has any size at all.
The C++ Standard states, at §8.3.2/4:
It is unspecified whether or not a reference requires storage.
And this non-specification is the main reason why both a pointer implementation and an aliasing implementation are valid implementations.
Therefore, both can be right.
They're both true, in a manner of speaking. Whether a reference gets compiled as a pointer is an implementation detail of the compiler, rather than a part of the C++ standard. Some compilers may use regular pointers, and some may use some other form or aliasing the referenced variable.
Consider the folowing line:
int var = 0;
int &myRef = var;
Compiler "A" may compile myRef as a pointer, and compiler "B" might use some other method for using myRef.
Of course, the same compiler may also compile the reference in different ways depending on the context. For example, in my example above, myRef may get optimized away completely, whereas in contexts where the reference is required to be present (such as a method parameter), it may be compiled to a pointer.