Missing clarification in the OCaml manual about unboxed/boxed - ocaml

The OCaml manual states:
As another optimization, unboxable record types are represented specially; unboxable record types are the immutable record types that have only one field.
(https://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#ss:c-tuples-and-records)
But which type can that "one field" have? Only native, or any record type?

Any type: OCaml memory representation is uniform.
More precisely, in term of memory representation, OCaml values are either integer or pointers to a block.
And a block consists in a header followed by a number of values.
Unboxing replaces the unboxed memory representation of a pointer to a block which contains only one OCaml value by this value.

Related

Is converting an integer to a pointer always well defined?

Is this valid C++?
int main() {
int *p;
p = reinterpret_cast<int*>(42);
}
Assuming I never dereference p.
Looking up the C++ standard, we have
C++17 §6.9.2/3 [basic.compound]
3 Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value ([conv.ptr]) for that type, or
an invalid pointer value.
A value of a pointer type that is a pointer to or past the end of an
object represents the address of the first byte in memory
([intro.memory]) occupied by the object or the first byte in memory
after the end of the storage occupied by the object, respectively. [
Note: A pointer past the end of an object ([expr.add]) is not
considered to point to an unrelated object of the object's type that
might be located at that address. A pointer value becomes invalid when
the storage it denotes reaches the end of its storage duration; see
[basic.stc]. — end note ] For purposes of pointer arithmetic
([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past
the end of the last element of an array x of n elements is considered
to be equivalent to a pointer to a hypothetical array element n of x
and an object of type T that is not an array element is considered to
belong to an array with one element of type T.
p = reinterpret_cast<int*>(42); does not fit into the list of possible values. And:
C++17 §8.2.10/5 [expr.reinterpret.cast]
A value of integral type or enumeration type can be explicitly
converted to a pointer. A pointer converted to an integer of
sufficient size (if any such exists on the implementation) and back to
the same pointer type will have its original value; mappings between
pointers and integers are otherwise implementation-defined. [ Note:
Except as described in 6.7.4.3, the result of such a conversion will
not be a safely-derived pointer value. — end note ]
C++ standard does not seem to say more about the integer to pointer conversion. Looking up the C17 standard:
C17 §6.3.2.3/5 (emphasis mine)
An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might not
be correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.68)
and
C17 §6.2.6.1/5
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
To me, it seems like any value that does not fit into the list in [basic.compound] is a trap representation, thus p = reinterpret_cast<int*>(42); is UB. Am I correct? Is there something else making p = reinterpret_cast<int*>(42); undefined?
This is not UB, but implementation-defined, and you already cited why (§8.2.10/5 [expr.reinterpret.cast]). If a pointer has invalid pointer value, it doesn't necessarily mean that it has a trap representation. It can have a trap representation, and the compiler must document this. All you have here is a not safely-derived pointer.
Note, that we generate pointers with invalid pointer value all the time: if an object is freed by delete, all the pointers which pointed to this object have invalid pointer value.
Using the resulting pointer is implementation defined as well (not UB):
[...] if the object to which the glvalue refers contains an invalid pointer value ([basic.stc.dynamic.deallocation], [basic.stc.dynamic.safety]), the behavior is implementation-defined.
The example shown is valid c++. On some platforms this is how you access "hardware resources" (and if it's not valid you have found a bug/mistake in standard text).
See also this answer for a better explanation.
Update:
The first sentence of reinterpret_cast as you quote yourself:
A value of integral type or enumeration type can be explicitly converted to a pointer.
I recommend you stop reading and rest yourself at this point. The rest of just a lot details including possible implementation specified behavior, etc. That doesn't make it UB/invalid.
Trap Representations
What: As covered by [C17 §6.2.6.1/5], a trap representation is a non-value. It is a bit pattern that fills the space allocated for an object of a given type, but this pattern does not correspond to a value of that type. It is a special pattern that can be recognized for the purpose of triggering behavior defined by the implementation. That is, the behavior is not covered by the standard, which means it falls under the banner of "undefined behavior". The standard sets out the possibilities for when a trap could be (not must be) triggered, but it makes no attempt to limit what a trap might do. For more information, see A: trap representation.
The undefined behavior associated with a trap representation is interesting in that an implementation has to check for it. The more common cases of undefined behavior were left undefined so that implementations do not need to check for them. The need to check for trap representations is a good reason to want few trap representations in an efficient implementation.
Who: The decision of which bit patterns (if any) constitute trap representations falls to the implementation. The standards do not force the existence of trap representations; when trap representations are mentioned, the wording is permissive, as in "might be", as opposed to demanding, as in "shall be". Trap representations are allowed, not required. In fact, N2091 came to the conclusion that trap representations are largely unused in practice, leading up to a proposal to remove them from the C standard. (It also proposes a backup plan if removal proves infeasible: explicitly call out that implementations must document which representations are trap representations, as there is no other way to know for sure whether or not a given bit pattern is a trap representation.)
Why: Theoretically, a trap representation could be used as a debugging aid. For example, an implementation could declare that 0xDDDD is a trap representation for pointer types, then choose to initialize all otherwise uninitialized pointers to this bit pattern. Reading this bit pattern could trigger a trap that alerts the programmer to the use of an uninitialized pointer. (Without the trap, a crash might not occur until later, complicating the debugging process. Sometimes early detection is the key.) In any event, a trap representation requires a trap of some sort to serve a purpose. An implementation would not define a trap representation without also defining its trap.
My point is that trap representations must be specified. They are deliberately removed from the set of values of a given type. They are not simply "everything else".
Pointer Values
C++17 §6.9.2/3 [basic.compound]
This section defines what an invalid pointer value is. It states "Every value of pointer type is one of the following" before listing four possibilities. That means that if you have a pointer value, then it is one of the four possibilities. The first three are fully specified (pointer to object or function, pointer past the end, and null pointer). The last possibility (invalid pointer value) is not fully specified elsewhere, so it becomes the catch-all "everything else" entry in the list (it is a "wild card", to borrow terminology from the comments). Hence this section defines "invalid pointer value" to mean a pointer value that does not point to something, does not point to the end of something, and is not null. If you have a pointer value that does not fit one of those three categories, then it is invalid.
In particular, if we agree that reinterpret_cast<int*>(42) does not point to something, does not point to the end of something, and is not null, then we must conclude that it is an invalid pointer value. (Admittedly, one could assume that the result of the cast is a trap representation for pointers in some implementation. In that case, yes, it does not fit into the list of possible pointer values because it would not be a pointer value, hence it's a trap representation. However, that is circular logic. Furthermore, based upon N2091, few implementations define any trap representations for pointers, so the assumption is likely groundless.)
[ Note: [...] A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration; see [basic.stc]. — end note ]
I should first acknowledge that this is a note. It explains and clarifies without adding new substance. One should expect no definitions in a note.
This note gives an example of an invalid pointer value. It clarifies that a pointer can (perhaps surprisingly) change from "points to an object" to "invalid pointer value" without changing its value. Looking at this from a formal logic perspective, this note is an implication: "if [something] then [invalid pointer]". Viewing this as a definition of "invalid pointer" is a fallacy; it is merely an example of one of the ways one can get an invalid pointer.
Casting
C++17 §8.2.10/5 [expr.reinterpret.cast]
A value of integral type or enumeration type can be explicitly converted to a pointer.
This explicitly permits reinterpret_cast<int*>(42). Therefore, the behavior is defined.
To be thorough, one should make sure there is nothing in the standard that makes 42 "erroneous data" to the degree that undefined behavior results from the cast. The rest of [§8.2.10/5] does not do this, and:
C++ standard does not seem to say more about the integer to pointer conversion.
Is this valid C++?
Yes.

Is reinterpret casting an integer to a pointer bijective if the integer is the same size as a pointer?

Given an integer type IntT such that sizeof(IntT) == sizeof(void*), and a variable of said type i, is it guaranteed that reinterpret_cast<IntT>(reinterpret_cast<void*>(i)) == i? This is similar to this question, but that question was looking at any arbitrary sized integer so the answer was a straight forward no. Limiting it to integers of exactly the same size as a pointer makes it more interesting.
It strikes me as though the answer would have to be "yes," because the specification states that there exists a mapping to any integer large enough to hold the pointer value. If the variables are the same size, then that mapping must be bijective. If it's bijective, then that also means the conversion from int to void* must also be bijective.
But is there a hole in that logic? Is there a wiggle word in the spec that I'm not accounting for?
I don't think this is guaranteed. The Standard guarantees that a pointer converted to a suitably large integer and back will have its original value. From this follows that there is a mapping from pointers to a subset of the suitably large integers and back. What it does not imply is that for every suitably-large integer value, there is a corresponding pointer value…
As pointed out by DavisHerring in the comments below, this means that the mapping is injective, but does not have to be surjective and, thus, bijective. I believe what the standard implies in mathematical terms would be that there is a left-unique and left-total relation between pointers and integers, not a bijective function.
Just imagine some weird architecture where, for some reason, every third Bit of an address must be zero. Or a slightly more reasonable architecture that uses only the lower 42 Bits of a 64-Bit value to store an address. Independently of how much sense that'd make, the compiler would be free to assume that an integer value being cast to a pointer must follow the pattern of a valid address and, e.g., mask out every third bit or only use the lower six Byte respectively…

C++ How can I assign a datatype to a binary sequence?

I have a binary sequence. This sequence represents an arbitrary precision integer but as far as the computer is concerned, it's just a binary sequence. I'm working in C++, with the multiprecision library. I only know how to assign values to the arbitrary precision datatype:
mp::cpp_int A = 51684861532215151;
How can I take a binary sequence and directly assign it to the datatype mp::cpp_int? I realize I can go through each bit and add 2^bit where ever I hit a 1, but I'm trying to avoid doing this.
REPLY:
Galik: My compiler (visual studio 2013) isn't liking that for some reason.
mp::cpp_int A = 0b0010011;
It keeps putting the red squigly after the first 0.
Also yup, boost multiprecision.
How to construct a particular type of big integer from a sequence of raw bits depends on that particular type, on the various constructors/methods that it offers for the purpose and/or what operator overloads are available.
The only generic mechanisms involve constructing a big integer with one word's worth of low-order bits (since such a constructor is almost universally available) and then using arithmetic to push the bits in, one bit at a time or one word's worth of bits at a time. This reduces the dependence on particulars of the given type to a minimum and it may work across a wide range of types completely unchanged, but it is rather cumbersome and not very efficient.
The particular type of big integer that is shown in your code splinter looks like boost::multiprecision::cpp_int, and Olaf Dietsche has already provided a link to its main documentation page. Conversion to and from raw binary formats for this type is documented on the page Importing and Exporting Data to and from cpp_int and cpp_bin_float, including code examples like initialising a cpp_int from a vector<byte>.

Why we need 1,2,4,8 bytes to store logical variable in fortran?

I don't understand that since logical type only has two cases: true and false, then why we need logical(1),logical(2),logical(4),logical(8) in Fortran?
We just need 1 bit.
Can somebody give an explanation?
First, Fortran doesn't say that we have logical types taking up 1, 2, 4 and 8 bytes each, and they certainly aren't logical(1), logical(2), logical(4), and logical(8). An implementation may choose to offer those, calling them those names.
A logical variable can indeed be of only two values. From the (F90, although F2008 says the same in a different place) standard 4.3.2.2:
The logical type has two values which represent true and false.
A processor must provide one or more representation methods for data of type logical. Each such method is characterized by a value for a type parameter called the kind type parameter.
[Emphasis here and later verbatim.]
For a logical type of default kind the rules of storage association (14.6.3.1) say that:
(1) A nonpointer scalar object of type default integer, default real, or default logical occupies a single numeric storage unit.
(5) A nonpointer scalar object of type [..] nondefault logical [..] occupies a single unspecified storage unit that is different for each case.
So, the compiler must offer a logical type which is of the same size as an integer and real type, but, equally, it can offer representations taking up 1 bit, 1 byte, or whatever. The kind number, and size, for any given representation (hence my first paragraph: the question isn't universally valid) is implementation-specific. That said, there is no SELECTED_LOGICAL_KIND (or such) intrinsic.
As to why multiple representations can be useful, that comes down to offering a choice, perhaps for special cases such as for arrays and ideal memory management (some people like to play non-portable tricks). However, memory access/alignment requirements suggest that a scalar logical would be at least one byte (or padding make it the same). For C interoperability (F2003+) there is a kind C_BOOL corresponding to the companion C processor's _Bool, which needn't be the same size.
LOGICAL
The FORTRAN standard requires logical variables to be the same size as INTEGER/REAL >variables (see the chapter on memory
management) although only one bit is really needed to implement this
type.
The values used to implement the logical constants .TRUE. and
.FALSE. differ:
| VMS | Sun | IRIX | -----------|------------|-----------|-----------|-----------
.TRUE. | -1 | 1 | 1 | -----------|------------|-----------|-----------|-----------
.FALSE. | 0 | 0 | 0 | -----------|------------|-----------|-----------|-----------
Unix machines naturally adopted the C convention, VMS has a seemingly
strange value for .TRUE., however on a closer look you will see that
if .FALSE. is "all bits 0", .TRUE. should be "all bits 1", in two's
complement signed integers the number with all bits set to 1 is -1.
http://www.ibiblio.org/pub/languages/fortran/ch2-3.html
It looks like its for simpler memory management
http://www.ibiblio.org/pub/languages/fortran/ch2-19.html

Is it safe to cast an int to void pointer and back to int again?

In C and/or C++: is it safe to cast an int to void pointer and back to int again?
Based on the question "C++: Is it safe to cast pointer to int and later back to pointer again?".
In most modern-day commonplace machines, probably.
However, I'd bet that there is some obscure compiler or configuration (say, a 16-bit addressed machine that uses 32-bit integer arithmetic) where that is not the case.
A uintptr_t is guaranteed to hold both, though, so use that type if you want to.
Here is an example where converting a pointer to an integer may not result in the same pointer when converting the integer to a pointer.
Given an architecture which has 24 bit addresses and uses two 16-bit quantities to describe the location. Let one quantity be the SEGMENT and the other OFFSET. A location is designated by the notation SEGMENT:OFFSET.
The actual 24-bit (Physical) address is calculated by:
address = segment * 16 + offset.
Using this notation, there can be more than one SEGMENT:OFFSET pair that describe the same physical address.
When converting to an integer, a 32-bit (unsigned) quantity is used (to simplify internal calculations in the processor). The problem is how to convert the physical address into the same SEGMENT::OFFSET that was used in the creation of the physical address.
A generic equation for converting integer to pointer is:
offset = address & 0xFFFF; // Mask off high order bits, keep lower 16.
segment = address >> 16; // Shift address right 16 bits, zero fill.
Although the physical address of this new segment and offset is equal to the physical address of the original SEGMENT:OFFSET, the segments and offsets are not guaranteed to be the same.
To optimize code, there are processor instructions that use relative addressing in a segment. These instructions may get messed up when the SEGMENT value changes due to conversion from a physical address.
In this scenario, converting from a pointer to an integer is possible. HOWEVER, converting from the integer to the pointer IS STRONGLY DISCOURAGED. Hard to debug errors could occur during run-time.
Bonus question: Can you name the actual architecture?
Why would you want to do this?
Reply for C, I don't know enough about C++ for that: No, behavior is not defined to cast an int to void*. First of all you should always use uintptr_t if you have it for such a thing. Using int is an abuse.
Then, C does not guarantee anything if your uintptr_t doesn't come from a valid address. It only guarantees the other way round. Don't do it.
Edit: Here is the relevant part of the C99 standard. As you can see all alarms can go off...
An integer may be converted to any
pointer type. Except as previously
specified, the result is
implementation-defined, might not be
correctly aligned, might not point to
an entity of the referenced type, and
might be a trap representation
The last is particularly embarrassing since this means that the pointer value that is such obtained can not be used anymore, until it is overwritten:
Certain object representations need
not represent a value of the object
type. If the stored value of an object
has such a representation and is read
by an lvalue expression that does not
have character type, the behavior is
undefined. ... Such a representation is
called a trap representation.
No. A void pointer is no different from any other pointer with respect to size. Hence it will run into exactly the same types of issues as other pointer types.
It's implementation defined just like the last question and for the same reason. It's less likely to result in misbehavior but it's still implementation defined.
No. There might be certain circumstances where it appears to work for a certain compiler&settings, and then two years later you spend weeks debugging that something changed and the conversion no longer works as expected.
If you just design your code in a way that doesn't need this sort of behavior (best case avoids use of such conversion at all, worst case use char[]) then you won't have to worry about obscure bugs in the future.
Not necessarily. Depends on the size of a pointer. Why would you want to do this?
If the range of your integers is fairly small, you could always do something like:
static const char dummy[MAXVAL];
and then use dummy+i as a way of encoding i as a pointer. This is 100% portable. If you only care that it's 99% portable, you could use (char *)0 + i.