Resolve (u)int_fastX_t at compile time - c++

Implementations of the C++ standard typedef the (u)int_fastX types as one of their built in types. This requires research in which type is the fastest, but there cannot be one fastest type for every case.
Wouldn't it increase performance to resolve such types at compile time to account for the case by chosing the optimal type for the actual use? The compiler would analyze the use of a _fast variable and then chose the optimal type. Factors coming into play could be alignment and the kind of operations used with the variable.
This would effectively make those types a language feature.
This could introduce bugs when the compiler suddenly decides to choose another width for such a variable. But one shouldn't use a _fast type in such use cases, where the behaviour depends on the width, anyways.
Is such compile time resolval permitted by the standard?
If yes, why isn't it implemented as of today?
If no, why isn't it in the standard?

No, this is not permitted by the standard. Keep in mind the C++ standard defers to C for this particular area, for example, C++11 defers to C99, as per C++11 1.1 /2. Specifically, C++11 18.4.1 Header <cstdint> synopsis /2 states:
The header defines all functions, types, and macros the same as 7.18 in the C standard.
So let's get your first contention out of the way, you state:
Implementations of the C++ standard typedef the (u)int_fastX types as one of their built in types. This requires research in which type is the fastest, but there cannot be one fastest type for every case.
The C standard has this to say, in c99 7.18.1.3 Fastest minimum-width integer types (my italics):
Each of the following types designates an integer type that is usually fastest to operate with among all integer types that have at least the specified width.
The designated type is not guaranteed to be fastest for all purposes; if the implementation has no clear grounds for choosing one type over another, it will simply pick some integer type satisfying the signedness and width requirements.
So you're indeed correct that a type cannot be fastest for all possible uses but this seems to not be what the authors had in mind in defining these aspects.
The introduction of the fixed-width types was (in my opinion) to solve the problem all those developers had in having different int widths across the various implementations.
Similarly, once a developer knows the range of values they want, the fast minimum-width types give them a way to do arithmetic on those values at the maximum possible speed.
Covering your three specific questions in your final paragraph (in bold below):
(1) Is such compile time resolution permitted by the standard?
I don't believe so. The relevant part of the C standard has this little piece of text:
For each type described herein that the implementation provides, <stdint.h> shall declare that typedef name and define the associated macros.
That seems to indicate that it must be a typedef provided by the implementation and, since there are no "variable" typedefs, it has to be fixed.
There may be wiggle room because it could be possible to provide a different typedef depending on certain environmental considerations but the difficulty in actually implementing this seems very high (see my answer to your third question below).
Chief amongst these is that these adaptable types, should they have external linkage, would require agreement amongst all the compiled translation units when linked together. Having one unit with a 16-bit type and another with a 32-bit type is going to cause all sorts of problems.
(2) If yes, why isn't it implemented as of today?
I'm pushing "no" as an answer to your first question so I'm not going to speculate on this other than by referring you to the answer to the third question below (it's probably not implemented because it's very hard, with dubious benefits).
(3) If no, why isn't it in the standard?
A standard is a contract between the implementor and the user and describes what the implementor will provide. It's usual that the standards committees tend to be more populated by the former (who aren't that keen on making too much extra work for themselves) than the latter.
For example, I would love to have all the you-beaut C++ data structures in C but this would have the consequence that standards versions would be decades apart rather than years :-)

Related

Practical meaning of std::strong_ordering and std::weak_ordering

I've been reading a bit about C++20's consistent comparison (i.e. operator<=>) but couldn't understand what's the practical difference between std::strong_ordering and std::weak_ordering (same goes for the _equality version for this manner).
Other than being very descriptive about the substitutability of the type, does it actually affect the generated code? Does it add any constraints for how one could use the type?
Would love to see a real-life example that demonstrates this.
Does it add any constraints for how one could use the type?
One very significant constraint (which wasn't intended by the original paper) was the adoption of the significance of strong_ordering by P0732 as an indicator that a class type can be used as a non-type template parameter. weak_ordering isn't sufficient for this case due to how template equivalence has to work. This is no longer the case, as non-type template parameters no longer work this way (see P1907R0 for explanation of issues and P1907R1 for wording of the new rules).
Generally, it's possible that some algorithms simply require weak_ordering but other algorithms require strong_ordering, so being able to annotate that on the type might mean a compile error (insufficiently strong ordering provided) instead of simply failing to meet the algorithm's requirements at runtime and hence just being undefined behavior. But all the algorithms in the standard library and the Ranges TS that I know of simply require weak_ordering. I do not know of one that requires strong_ordering off the top of my head.
Does it actually affect the generated code?
Outside of the cases where strong_ordering is required, or an algorithm explicitly chooses different behavior based on the comparison category, no.
There really isn't any reason to have std::weak_ordering. It's true that the standard describes operations like sorting in terms of a "strict" weak order, but there's an isomorphism between strict weak orderings and a totally ordered partition of the original set into incomparability equivalence classes. It's rare to encounter generic code that is interested both in the order structure (which considers each equivalence class to be one "value") and in some possibly finer notion of equivalence: note that when the standard library uses < (or <=>) it does not use == (which might be finer).
The usual example for std::weak_ordering is a case-insensitive string, since for instance printing two strings that differ only by case certainly produces different behavior despite their equivalence (under any operator). However, lots of types can have different behavior despite being ==: two std::vector<int> objects, for instance, might have the same contents and different capacities, so that appending to them might invalidate iterators differently.
The simple fact is that the "equality" implied by std::strong_ordering::equivalent but not by std::weak_ordering::equivalent is irrelevant to the very code that stands to benefit from it, because generic code doesn't depend on the implied behavioral changes, and non-generic code doesn't need to distinguish between the ordering types because it knows the rules for the type on which it operates.
The standard attempts to give the distinction meaning by talking about "substitutability", but that is inevitably circular because it can sensibly refer only to the very state examined by the comparisons. This was discussed prior to publishing C++20, but (perhaps for the obvious reasons) not much of the planned further discussion has taken place.

Is C++ considered weakly typed? Why?

I've always considered C++ to be one of the most strongly typed languages out there.
So I was quite shocked to see Table 3 of this paper state that C++ is weakly typed.
Apparently,
C and C++ are considered weakly typed since, due to type-casting, one can interpret a field of a structure that was an integer as a pointer.
Is the existence of type casting all that matters? Does the explicit-ness of such casts not matter?
More generally, is it really generally accepted that C++ is weakly typed? Why?
That paper first claims:
In contrast, a language is weakly-typed if type-confusion can occur silently (undetected), and eventually cause errors that are difficult to localize.
And then claims:
Also, C and C++ are considered weakly typed since, due to type-casting, one can interpret a field of a structure that was an integer as a pointer.
This seems like a contradiction to me. In C and C++, the type-confusion that can occur as a result of casts will not occur silently -- there's a cast! This does not demonstrate that either of those languages is weakly-typed, at least not by the definition in that paper.
That said, by the definition in the paper, C and C++ may still be considered weakly-typed. There are, as noted in the comments on the question already, cases where the language supports implicit type conversions. Many types can be implicitly converted to bool, a literal zero of type int can be silently converted to any pointer type, there are conversions between integers of varying sizes, etc, so this seems like a good reason to consider C and C++ weakly-typed for the purposes of the paper.
For C (but not C++), there are also more dangerous implicit conversions that are worth mentioning:
int main() {
int i = 0;
void *v = &i;
char *c = v;
return *c;
}
For the purposes of the paper, that must definitely be considered weakly-typed. The reinterpretation of bits happens silently, and can be made far worse by modifying it to use completely unrelated types, which has silent undefined behaviour that typically has the same effect as reinterpreting bits, but blows up in mysterious yet sometimes amusing ways when optimisations are enabled.
In general, though, I think there isn't a fixed definition of "strongly-typed" and "weakly-typed". There are various grades, a language that is strongly-typed compared to assembly may be weakly-typed compared to Pascal. To determine whether C or C++ is weakly-typed, you first have to ask what you want weakly-typed to mean.
"weakly typed" is a quite subjective term. I prefer the terms "strictly typed" and "statically typed" vs. "loosely typed" and "dynamically typed", because they are more objective and more precise words.
From what I can tell, people generally use "weakly typed" as a diminutive-pejorative term which means "I don't like the notion of types in this language". It's sort of an argumentum ad hominem (or rather, argumentum ad linguam) for those who can't bring up professional or technical arguments against a particular language.
The term "strictly typed" also has slightly different interpretations; the generally accepted meaning, in my experience, is "the compiler generates errors if types don't match up". Another interpretation is that "there are no or few implicit conversions". Based on this, C++ can actually be considered a strictly typed language, and most often it is considered as such. I would say that the general consensus on C++ is that it is a strictly typed language.
Of course we could try a more nuanced approach to the question and say that parts of the language are strictly typed (this is the majority of the cases), other parts are loosely typed (a few implicit conversions, e. g. arithmetic conversions and the four types of explicit conversion).
Furthermore, there are some programmers, especially beginners who are not familiar with more than a few languages, who don't intend to or can't make the distinction between "strict" and "static", "loose" and "dynamic", and conflate the two - otherwise orthogonal - concepts based on their limited experience (usually the correlation of dynamism and loose typing in popular scripting languages, for example).
In reality, parts of C++ (virtual calls) impose the requirement that the type system be partially dynamic, but other things in the standard require that it be strict. Again, this is not a problem, since these are orthogonal concepts.
To sum up: probably no language fits completely, perfectly into one category or another, but we can say which particular property of a given language dominates. In C++, strictness definitely does dominate.
In contrast, a language is weakly-typed if type-confusion can occur silently (undetected), and eventually cause errors that are difficult to localize.
Well, that can happen in C++, for example:
#define _USE_MATH_DEFINES
#include <iostream>
#include <cmath>
#include <limits>
void f(char n) { std::cout << "f(char)\n"; }
void f(int n) { std::cout << "f(int)\n"; }
void g(int n) { std::cout << "f(int)\n"; }
int main()
{
float fl = M_PI; // silent conversion to float may lose precision
f(8 + '0'); // potentially unintended treatment as int
unsigned n = std::numeric_limits<unsigned>::max();
g(n); // potentially unintended treatment as int
}
Also, C and C++ are considered weakly typed since, due to type-casting, one can interpret a field of a structure that was an integer as a pointer.
Ummmm... not via any implicit conversion, so that's a silly argument. C++ allows explicit casting between types, but that's hardly "weak" - it doesn't happen accidentally/silently as required by the site's own definition above.
Is the existence of type casting all that matters? Does the explicit-ness of such casts not matter?
Explicitness is a crucial consideration IMHO. Letting a programmer override the compiler's knowledge of types is one of the "power" features of C++, not some weakness. It's not prone to accidental use.
More generally, is it really generally accepted that C++ is weakly typed? Why?
No - I don't think it is accepted. C++ is reasonably strongly typed, and the ways in which it has been lenient that have historically caused trouble have been pruned back, such as implicit casts from void* to other pointer types, and finer grained control with explicit casting operators and constructors.
Well, since the creator of C++, Bjarne Stroustrup, says in The C++ Programming Language (4th edition) that the language is strongly typed, I would take his word for it:
C++ programming is based on strong static type checking, and most techniques aim at achieving a high level of abstraction and a direct representation of the programmer’s ideas. This can usually be done without compromising run-time and space efficiency compared to lower-level techniques. To gain the benefits of C++, programmers coming to it from a different language must
learn and internalize idiomatic C++ programming style and technique. The same applies to programmers used to earlier and less expressive versions of C++.
In this video lecture from 1994 he also states that the weak type system of C really bothered him, and that's why he made C++ strongly typed: The Design of C++ , lecture by Bjarne Stroustrup
In General:
There is a confusion around the subject. Some terms differ from book to book (not considering the internet here), and some may have changed over the years.
Below is what I've understood from the book "Engineering a Compiler" (2nd Edition).
1. Untyped Languages
Languages that have no types at all, like for example in assembly.
2. Weakly Typed Languages:
Languages that have poor type system.
The definition here is intentionally ambiguous.
3. Strongly Typed Languages:
Languages where each expression have unambiguous type. PL can further categorised to:
A. Statically Typed: when every expression is assigned a type at compile time.
B. Dynamically Typed: when some expressions can only be typed at runtime.
What is C++ then?
Well, it's strongly typed for sure. And mostly it is statically typed.
But as some expressions can only be typed at runtime, I guess it falls under the 3.B category.
PS1: A note from the book:
A strongly typed language, that can be statically checkable, might be implemented (for some reason) just with runtime checking.
PS2: Third Edition was recently released
I don't own it, so I don't know if anything had changed on this regard.
But in general, the "Semantic Analysis" Chapter had changed both title and order in Table of Contents.
Let me give you a simple example:
if ( a + b )
C/C+= allows an implicit conversion from float to int to Boolean.
A strongly-typed language would not allow such an implicit conversion.

Is std::vector<T> a `user-defined type`?

In 17.6.4.2.1/1 and 17.6.4.2.1/2 of the current draft standard restrictions are placed on specializations injected by users into namespace std.
The behavior of a C
++
program is undefined if it adds declarations or definitions to namespace
std
or to a
namespace within namespace
std
unless otherwise specified. A program may add a template specialization
for any standard library template to namespace
std
only if the declaration depends on a user-defined type
and the specialization meets the standard library requirements for the original template and is not explicitly
prohibited.
I cannot find where in the standard the phrase user-defined type is defined.
One option I have heard claimed is that a type that is not std::is_fundamental is a user-defined type, in which case std::vector<int> would be a user-defined type.
An alternative answer would be that a user-defined type is a type that a user defines. As users do not define std::vector<int>, and std::vector<int> is not dependent on any type a user defines, std::vector<int> is not a user-defined type.
A practical problem this impacts is "can you inject a specialization for std::hash for std::tuple<Ts...> into namespace std? Being able to do so is somewhat convenient -- the alternative is to create another namespace where we recursively build our hash for std::tuple (and possibly other types in std that do not have hash support), and if and only if we fail to find a hash in that namespace do we fall back on std.
However, if this is legal, then if and when the standard adds a hash specialization for std::tuple to namespace std, code that specialized it already would be broken, creating a reason not to add such specializations in the future.
While I am talking about std::vector<int> as a concrete example, I am trying to ask if types defined in std are ever user-defined type s. A secondary question is, even if not, maybe std::tuple<int> becomes a user-defined type when used by a user (this gets slippery: what then happens if something inside std defines std::tuple<int>, and you partial-specialize hash for std::tuple<Ts...>).
There is currently an open defect on this problem.
Prof. Stroustrup is very clear that any type that is not built-in is user-defined. See the second paragraph of section 9.1 in Programming Principles and Practice Using C++.
He even specifically calls out “standard library types” as an example of user-defined types. In other words, a user-defined type is any compound type.
Source
The article explicitly mentions that not everyone seems to agree, but this is IMHO mostly wishful thinking and not what the standard (and Prof. Stroustrup) are actually saying, only what some people want to read into it.
When Clause 17 says "user-defined" it means "a type not defined in the standard" so std::vector<int> is not user-defined, neither is std::string, so you cannot specialize std::vector<int> or std::vector<std::string>. On the other hand, struct MyClass is user-defined, because it's not a type defined in the standard, so you can specialize std::vector<MyClass>.
This is not the same meaning of "user-defined" used in clauses 1-16, and that difference is confusing and silly. There is a defect report for this, with some discussion recorded that basically says "yes, the library uses the wrong term, but we don't have a better one".
So the answer to your question is "it depends". If you're talking to a C++ compiler implementor or a core language expert, std::vector<int> is definitely a user-defined type, but if you're talking to a standard library implementor, it is not. More precisely, it's not user-defined for the purposes of 17,6.4.2.1.
One way to look at it is that the standard library is "user code" as far as the core language is concerned. But the standard library has a different idea of "users" and considers itself to be part of the implementation, and only things that aren't part of the library are "user-defined".
Edit: I have proposed changing the library Clauses to use a new term, "program-defined", which means something defined in your program (as opposed to UDTs defined in the standard, such as std::string).
As users do not define std::vector<int>, and std::vector<int> is not dependent on any type a user defines, std::vector<int> is not a user-defined type.
The logical counter argument is that users do define std::vector<int>. You see std::vector is a class template and as such has no direct representation in binary code.
In a sense it gets it binary representation through the instantiation of a type, so the very action of declaring a std::vector<int> object is what gives "soul" to the template (pardon the phrasing). In a program where noone uses a std::vector<int> this data type does not exist.
On the other hand, following the same argument, std::vector<T> is not a user defined type, it is not even a type, it does not exist; only if we want to (instantiate a type), it will mandate how a structure will be layed out but until then we can only argue about it in terms of structure, design, properties and so on.
Note
The above argument (about templates being not code but ... well templates for code) may seem a bit superficial but draws it's logic, from Mayer's introduction in A. Alexandrescu's book Modern C++ Design. The relative quote there, goes like this :
Eventually, Andrei turned his attention to the development of template-based implementations of popular language idioms and design patterns, especially the GoF[*] patterns. This led to a brief skirmish with the Patterns community, because one of their fundamental tenets is that patterns cannot be represented in code. Once it became clear that Andrei was automating the generation of pattern implementations rather than trying to encode patterns themselves, that objection was removed, and I was pleased to see Andrei and one of the GoF (John Vlissides) collaborate on two columns in the C++ Report focusing on Andrei's work.
The draft standard contrasts fundamental types with user-defined types in a couple of (non-normative) places.
The draft standard also uses the term "user-defined" in other contexts, referring to entities created by the programmer or defined in the standard library. Examples include user-defined constructor, user-defined operator and user-defined conversion.
These facts allow us, absent other evidence, to tentatively assume that the intent of the standard is that user-defined type should mean compound type, according to historical usage. Only an explicit clarification in a future standard document can definitely resolve the issue.
Note that the historical usage is not clear on types like int* or struct foo* or void(*)(struct foo****). They are compound, but should they (or some of them) be considered user-defined?

C++11 and [17.5.2.1.3] Bitmask Types

The Standard allows one to choose between an integer type, an enum, and a std::bitset.
Why would a library implementor use one over the other given these choices?
Case in point, llvm's libcxx appears to use a combination of (at least) two of these implementation options:
ctype_base::mask is implemented using an integer type:
<__locale>
regex_constants::syntax_option_type is implemented using an enum + overloaded operators:
<regex>
The gcc project's libstdc++ uses all three:
ios_base::fmtflags is implemented using an enum + overloaded operators: <bits/ios_base.h>
regex_constants::syntax_option_type is implemented using an integer type,
regex_constants::match_flag_type is implemented using a std::bitset
Both: <bits/regex_constants.h>
AFAIK, gdb cannot "detect" the bitfieldness of any of these three choices so there would not be a difference wrt enhanced debugging.
The enum solution and integer type solution should always use the same space. std::bitset does not seem to make the guarantee that sizeof(std::bitset<32>) == std::uint32_t so I don't see what is particularly appealing about std::bitset.
The enum solution seems slightly less type safe because the combinations of the masks does not generate an enumerator.
Strictly speaking, the aforementioned is with respect to n3376 and not FDIS (as I do not have access to FDIS).
Any available enlightenment in this area would be appreciated.
The really surprising thing is that the standard restricts it to just three alternatives. Why shouldn't a class type be acceptable? Anyway…
Integral types are the simplest alternative, but they lack type safety. Very old legacy code will tend to use these as they are also the oldest.
Enumeration types are safe but cumbersome, and until C++11 they tended to be fixed to the size and range of int.
std::bitset may be have somewhat more type safety in that bitset<5> and bitset<6> are different types, and addition is disallowed, but otherwise is unsafe much like an integral type. This wouldn't be an issue if they had allowed types derived from std::bitset<N>.
Clearly enums are the ideal alternative, but experience has proven that the type safety is really unnecessary. So they threw implementers a bone and allowed them to take easier routes. The short answer, then, is that laziness leads implementers to choose int or bitset.
It is a little odd that types derived from bitset aren't allowed, but really that's a minor thing.
The main specification that clause provides is the set of operations defined over these types (i.e., the bitwise operators).
My preference is to use an enum, but there are sometimes valid reasons to use an integer. Usually ctype_base::mask interacts with the native OS headers, with a mapping from ctype_base::mask to the <ctype.h> implementation-defined constants such as _CTYPE_L and _CTYPE_U used for isupper and islower etc. Using an integer might make it easier to use ctype_base::mask directly with native OS APIs.
I don't know why libstdc++'s <regex> uses a std::bitset. When that code was committed I made a mental note to replace the integer types with an enumeration at some point, but <regex> is not a priority for me to work on.
Why would the standard allow different ways of implementing the library? And the answer is: Why not?
As you have seen, all three options are obviously used in some implementations. The standard doesn't want to make existing implementations non-conforming, if that can be avoided.
One reason to use a bitset could be that its size fits better than an enum or an integer. Not all systems even have a std::uint32_t. Maybe a bitset<24> will work better there?

complex number types in mixing C(99) and C++

I'm writing a math library, the core of it is in C++. Later it may be implemented in pure C (C99 I suppose). I think I need a C like API so that I can get Python and matlab and the like to use the library. My impression is that doing this with C++ is painful.
So is there a good or standard or proper way to cast between double complex *some_array_in_C99, and complex<double> *some_array_in_cpp ?
I could just use void *pointers, but I'm not sure if that's good.
This may be nitpicking, because ctypes seems to work fine with complex<double>, but I'm worried about matlab and other possible numerical environments.
The C99 and C++0x standards both specify that their respective double complex types must have the same alignment and layout as an array of two doubles. This means that you can get away with passing arguments as a void * and have your routines be (relatively) easily callable from either language, and this is an approach that many libraries have taken.
The C++0x standard guarantees (§26.4) that a reinterpret_cast of std::complex<double>* to double* will do the right thing; if I remember correctly, this was not so clearly specified in earlier versions of the standard. If you are willing to target C++0x, it may be possible for you to use this to do something cleaner for your interfaces.
Given that the actual layout and alignment specifications are defined to agree, I would be tempted to just condition the type in the header file on the language; your implementation can use either language, and the data will be laid out properly in memory either way. I'm not sure how MATLAB does things internally though, so I don't know if this is compatible with MATLAB or not; if they use the standard LAPACK approach, then it will be on many but not all platforms in all circumstances; LAPACK defines its own double complex type to be a struct with two double members, which will usually be laid out the same way in memory (this isn't guaranteed), but could follow a different calling convention on some platforms.