C++11 data types confusion

C++11 data types confusion - c++

I am trying to write a solid summary for C++ datatypes, but I have some confusion about the new datatypes.
As I understood from my readings about C++ data types, char16_t and char_32_t are fundamental data types and part of the core language since C++11.
It is mentioned that they are distinct data types.
Q1: What exactly does "distinct" mean here?
Q2: Why intxx_t type family like int32_t was chosen not to be a fundamental datatype? And how can they be beneficial when choosing them instead of int?

To answer the second part of the question:
The fixed size integer types are inherited from C, where they are typedefs. It was decided to keep them as typedefs to be compatible. Note that the C language doesn't have overloaded functions, so the need for "distinct" types is lower there.
One reason for using int32_t is that you need one or more of its required properties:
Signed integer type with width of exactly 32 bits
with no padding bits and using 2's complement for negative values.
If you use an int it might, for example, be 36 bits and use 1's complement.
However, if you don't have very specific requirements, using a normal int will work fine. One advantage is that an int will be available on all systems, while the 36-bit machine (or a 24 bit embedded processor) might not have any int32_t at all.

The charXX_t types were introduced in N2249. They are created as a distinct type from uintXX_t to allow overloading:
Define char16_t to be a distinct new type, that has the same size and representation as uint_least16_t. Likewise, define char32_t to be a distinct new type, that has the same size and representation as uint_least32_t.
[N1040 defined char16_t and char32_t as typedefs to uint_least16_t and uint_least32_t, which make overloading on these characters impossible.]

To answer your Q1:
Distinct type means std::is_same<char16_t,uint_least16_t>::value is equal to false.
So overloaded functions are possible.
(There is no difference in size, signedness, and alignment, though.)

Other way to express "distinct types" is that you can create two overloaded functions for each type. For instance:
typedef int Int;
void f(int) { impl_1; }
void f(Int) { impl_2; }
If you try to compile a code snippet containing both functions, the compiler will complain about a ODR violation: you are trying to redefine the same function twice, since their arguments are the same. That's because typedefs doesn't create types, just aliases.
However, when types are truly distinct, both versions will be seen as two different overloads by the compiler.

Related

Using std::string as a generic uint8_t buffer

I am looking through the source code of Chromium to study how they implemented MediaRecorder API that encodes/records raw mic input stream to a particular format.
I came across interesting codes from their source. In short:
bool DoEncode(float* data_in, std::string* data_out) {
...
data_out->resize(MAX_DATA_BTYES_OR_SOMETHING);
opus_encode_float(
data_in,
reinterpret_cast<uint8_t*>(base::data(*data_out))
);
...
}
So DoEncode (C++ method) here accepts an array of float and converts it to an encoded byte stream, and the actual operation is done in opus_encode_float() (which is a pure C function).
The interesting part is the Google Chromium team used std::string for an byte array instead of std::vector<uint_8> and they even manually cast to a uint8_t buffer.
Why would the guys from Google Chromium team do like this, and is there a scenario that using std::string is more useful for a generic bytes buffer than using others like std::vector<uint8_t>?

The Chromium coding style (see below) forbids using unsigned integral types without good reason. External API is not such reason. Sizes of signed and unsigned chars are 1, so why not.
I looked at opus encoder API and it seems the earlier versions used signed char:
[out] data char*: Output payload (at least max_data_bytes long)
Although API uses unsigned chars now, the description still refers to signed char. So std::string for chars was more convenient for the earlier API and Chromium team didn't change the already used container after API was updated, they used cast in one line instead of updating tens other lines.
Integer Types
You should not use the unsigned integer types such as uint32_t, unless there is a valid reason such as representing a bit pattern rather than a number, or you need defined overflow modulo 2^N. In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this.
If your code is a container that returns a size, be sure to use a type that will accommodate any possible usage of your container. When in doubt, use a larger type rather than a smaller type.
Use care when converting integer types. Integer conversions and promotions can cause undefined behavior, leading to security bugs and other problems.
On Unsigned Integers
Unsigned integers are good for representing bitfields and modular arithmetic. Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point. The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler. In other cases, the defined behavior impedes optimization.
That said, mixing signedness of integer types is responsible for an equally large class of problems. The best advice we can provide: try to use iterators and containers rather than pointers and sizes, try not to mix signedness, and try to avoid unsigned types (except for representing bitfields or modular arithmetic). Do not use an unsigned type merely to assert that a variable is non-negative.

We can only theorize.
My speculation: they wanted to use the built-in SSO optimization that exists in std::string but might not be available for std::vector<uint8_t>.

Exactly how many primitive data types are there in C++?

I know that it might not be important to know but it's purely based on my curiosity. I've looked everywhere on the internet and every website had different numbers which was really frustrating. this website (https://en.cppreference.com/w/cpp/language/types) shows 28 primitive data types for C++ while others show different numbers. Can anyone help me with this?

It really depends on how you count the data types. This web site lists these 7:
bool
char
int
float
double
void
wchar_t
However, these types can be modified with signed, unsigned, short, long. The site that you mentioned lists all of these, plus the new ones like char16_t and char32_t. I think that the 28 listed is a very comprehensive list, and I can't think of any that have been omitted (they've even covered unsigned long long int).
So, 28 looks right to me. The reason other sites my have different numbers is because they don't include the new ones, or they don't count all of the modifiers. Other sites may consider unsigned short int different from short unsigned int, but the two are equivalent.

Primitive Data Types: These data types are built-in or predefined
data types and can be used directly by the user to declare variables.
example: int, char , float, bool etc.
Primitive data types available in C++ are:
Integer
Character
Boolean
Floating Point
Double Floating Point
Valueless or Void
Wide Character
You think that the short int and long int are primitive data types.
Those are combined with primitive data type int and data modifier short and long.
Datatype Modifiers: As the name implies, datatype modifiers are used with the built-in data types to modify the length of data that a particular data type can hold.
Data type modifiers available in C++ are:
Signed
Unsigned
Short
Long
This gives you the helpful answer.

"Primitive data type" is not a term that the standard specifies, so you might get a different answer depending on who you ask. Clang defines the following types as "built-in", meaning that they aren't derived from any other type:
void
bool
std::nullptr_t
float
double
long double
char16_t
char32_t
signed and unsigned variants of:
char
wchar_t
short
int
long
long long
The list contains more, but I believe that those are the only ones that are specified in standard C++.
The standard has essentially the same thing in [basic.fundamental] (calling these "fundamental types"), but the list isn't as convenient to navigate.
That would be a total of 20 primitive types (ignoring that char and wchar_t are treated separately from their explicitly signed/unsigned variants, because their default signedness is platform-dependent).
The standard also allows implementations to have "extended" signed and unsigned integer types. For instance, Clang supports a signed and unsigned __int128_t, which would fall in that category, but it isn't required by the standard.

<stdint.h> or standard types?

Which types should I use when programming C++ on Linux? Is it good idea to use types from stdint.h, such as int16_t and uint8_t?
On one hand, surely stdint.h won't be available for programming on Windows. On the other though, size of e.g. short isn't clear on the first batch of an eye. And it's even more intuitive to write int8_t instead of char...
Does C++ standard guarantee, that sizes of standard types will be unchanged in future?

First off, Microsoft's implementation does support <stdint.h>.
Use the appropriate type for what you're doing.
If you need, for example, an unsigned type that's exactly 16 bits wide with no padding bits, use uint16_t, defined in <stdint.h>.
If you need an unsigned type that's at least 16 bits wide, you can use uint_least16_t, or uint_fast16_t, or short, or int.
You probably don't need exact-width types as often as you think you do. Very often what matters is not the exact size of a type, but the range of values it supports. But exact representation is important when you're interfacing to some externally defined data format. In that case, you should already have declarations that tell you what types to use.
There are specific requirements on the ranges of the predefined types: char is at least 8 bits, short and int are at least 16 bits, long is at least 32 bits, and long long is at least 64 bits. Also, short is at least as wide as char, int is at least as wide as short, and so forth. (The standard specifies minimum ranges, but the minimum sizes can be derived from the ranges and the fact that a binary representation is required.)
Note that <stdint.h> is a C header. If you #include it in a C++ program, the type names will be imported directly into the global namespace, and may or may not also be imported into the std namespace. If you #include <cstdint>, then the type names will be imported into the std namespace, and may or may not also be imported into the global namespace. Macro names such as UINT32_MAX are not in any namespace; they're always global. You can use either version of the header; just be consistent about using or not using the std:: prefix.

C++ standard does not specify much about sizes of integer types (such as int, long or char). If you want to be sure, that certain type has fixed size across platforms, you can use C++11's Fixed-width integer types, which are standardized and guaranteed to have given size.
To use them, #include <cstdint>.
Does C++ standard guarantee, that sizes of standard types will be unchanged in future?
Not likely. On 8bit computers, sizes of integers types were different to what they are today. In the future, in 2042, with 1024-bit computers, I assume long long to be 1024-bit long.
However, we can be almost absolutely sure, that std::uint32_t will stay 32-bit long.

same-length same-signedness integer types in template parameters

When being used as template parameters, are integer types of the same length and the same signedness considered equal, i.e., do they produce the same template class when being used as a template parameter? Which paragraph of the spec handles this case?
E.g., consider I am on an architecture on which unsigned and unsigned long are both 32-bit, then will for example a vector<unsigned> be another class as a vector<unsigned long> or will they be treated as the same type?

The types unsigned int and unsigned long are guaranteed to be different types. This is clarified by a note in the standard:
Even if the implementation defines two or more basic types to have the same value representation,
they are nevertheless different types.
([basic.fundamental]/11)
In general, two types are only the same if one is aliased to the other (i.e., with typedef or using) or if both are aliased to the same type.
Given that unsigned int and unsigned long are different types, vector<unsigned int> and vector<unsigned long> are also different types, even if the two classes have identical layouts.

If you do a blind type-cast between different integer types then the cast will work sometimes, but necessarily every time. This is because different types have different minimum and maximum values. Type casting will be allowed by most compilers but they will use rules that you may not be familiar with resulting in potentially unexpected behaviour. The best thing to do is to research these types and then ensure that any code you write that converts between them prevents unpredictable results.

meaning of known data types [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What does a type followed by _t (underscore-t) represent?
Does anyone knows what the 't' in time_t, uint8_t, etc. stands for, is it "type" ?
second, why declare this kind of new types, for instance size_t, couldn't it be just an int ?

Yes, the t is for Type.
The reason for defining the new types is so they can change in the future. As 64-bit machines have become the norm, it's possible for implementations to change the bit-width of size_t to 64 bits instead of just 32. It's a way to future-proof your programs. Some small embedded processors only handle 16 bit numbers well. Their size_t might only be 16 bits wide.
An especially important one might be ptrdiff_t, which represents the difference between two pointers. If the pointer size changes (say to 64 or 128 bits) sometime in the future, your program should not care.
Another reason for the typedefs is stylistic. While those size_t might just be defined by
typedef int size_t;
using the name size_t clearly shows that variable is meant to be the size of something (a container, a region of memory, etc, etc).

I think, it stands for type - a type which is possibly a typedef of some other type. So when we see int, we can assume that it is not a typedef of any type, but when we see uint32_t, it is most likely a typedef of some type. It is not a rule, but my observation, though there is one exception to this: wchar_t is not a typedef of any other type, yet it has _t.

Yes, it probably stands for type or typedef, or something like that.
The idea between those typedefs is that you are specifying exactly that that variable is not a generic int, but it is the size of an object/the number of seconds since the UNIX epoch/whatever; also, the standard makes specific guarantees about the characteristics of those types.
For example, size_t is guaranteed to contain the size of the biggest object you can create in C - and a type that can do this can change depending on the platform (on Win32 unsigned long is ok, on Win64 you need unsigned long long, while on some microcontrollers with really small memory an unsigned short may suffice).
As for the various [u]intNN_t, they are fixed size integer types: while for "plain" int/short/long/... the standard do not mandate a specific size, often you'll need a type that, wherever you compile your program, is guaranteed to be of that specific size (e.g. if you are reading a binary file); those typedefs are the solution for this necessity. (By the way, there are also typedefs for "fastest integer of at least some size", when you just need a minimum guaranteed range.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js