meaning of known data types [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What does a type followed by _t (underscore-t) represent?
Does anyone knows what the 't' in time_t, uint8_t, etc. stands for, is it "type" ?
second, why declare this kind of new types, for instance size_t, couldn't it be just an int ?

Yes, the t is for Type.
The reason for defining the new types is so they can change in the future. As 64-bit machines have become the norm, it's possible for implementations to change the bit-width of size_t to 64 bits instead of just 32. It's a way to future-proof your programs. Some small embedded processors only handle 16 bit numbers well. Their size_t might only be 16 bits wide.
An especially important one might be ptrdiff_t, which represents the difference between two pointers. If the pointer size changes (say to 64 or 128 bits) sometime in the future, your program should not care.
Another reason for the typedefs is stylistic. While those size_t might just be defined by
typedef int size_t;
using the name size_t clearly shows that variable is meant to be the size of something (a container, a region of memory, etc, etc).

I think, it stands for type - a type which is possibly a typedef of some other type. So when we see int, we can assume that it is not a typedef of any type, but when we see uint32_t, it is most likely a typedef of some type. It is not a rule, but my observation, though there is one exception to this: wchar_t is not a typedef of any other type, yet it has _t.

Yes, it probably stands for type or typedef, or something like that.
The idea between those typedefs is that you are specifying exactly that that variable is not a generic int, but it is the size of an object/the number of seconds since the UNIX epoch/whatever; also, the standard makes specific guarantees about the characteristics of those types.
For example, size_t is guaranteed to contain the size of the biggest object you can create in C - and a type that can do this can change depending on the platform (on Win32 unsigned long is ok, on Win64 you need unsigned long long, while on some microcontrollers with really small memory an unsigned short may suffice).
As for the various [u]intNN_t, they are fixed size integer types: while for "plain" int/short/long/... the standard do not mandate a specific size, often you'll need a type that, wherever you compile your program, is guaranteed to be of that specific size (e.g. if you are reading a binary file); those typedefs are the solution for this necessity. (By the way, there are also typedefs for "fastest integer of at least some size", when you just need a minimum guaranteed range.)

Related

What is the most modern and idiomatic way to define a type safe "byte" ( as in 8 bits ) type in C++?

I want to define a byte type in my C++ program, basically an unsigned char what is the most idiomatic way to go about doing this?
I want to define a byte type to abstract away the different representations and make it possible to create typesafe arrays of this new byte ( 8 bit ) type that is backed by an unsigned char for a bit manipulation library I am working on for a very specific use case of a program I am creating. I want it to be very explicit that this is an 8 bit byte specific to the domain of my program and that is is not subject to the varying implementations based on platform or compiler.
char, unsigned char, or signed char are all one byte; std::uint8_t (from <cstdint>) is an 8-bit byte (a signed variant exists too). This last one only exists on systems that do have 8-bit bytes. There is also std::uint_least8_t (from the same header), which has at least 8 bits and std::uint_fast8_t, which has at least 8 bits and is supposed to be the most efficient one.
The most idiomatic way is to just use signed char or unsigned char. You can use typedef if you want to call it byte or if you need it to be strongly typed you could use BOOST_STRONG_TYPEDEF.
If you need it to be exactly 8 bits, you can use uint8_t from <cstdint> but it is not guaranteed to exist on all platforms.
To be honest, this is one of the most irritating "features" in C++ for me.
Yes, you can use std::uint8_t or unsigned char, which on most systems the former will be the typedef of the latter.
But... This is not type safe, as typedef will not create a new type. And commitee refused to add a "strong typedef" to the standard.
consider
void foo (std::uint8_t);
void foo (unsigned char); // ups...
I am currently using the uint8_t approach. The way I see it is, if a platform does not have an 8 bit type (in which case my code will not function on that platform), then I don't want it to be running anyways, because I would end up with unexpected behaviour due to the fact that I am processing data with the assumption that it is 8 bits, when in fact it is not. So I don't see why you should use unsigned char, assume it is 8 bits, and then perform all your calculations based on that assumption. It's just asking for trouble in my opinion.

converting size_t into long, Is there any disadvantage?

Is there any disadvantage of converting size_t to long? Because, I am writing an program that maintains linked_list in a file. So I traverse to another node based on size_t and I also keep track of total number of lists as size_t. Hence, obviously there is going to be some conversion or addition of long and size_t. Is there any disadvantage of this? If there is then I will make everything as long instead of size_t, even the sizes. Please advise.
The "long" type, unfortunately, doesn't have a good theoretical basis. Originally it was introduced on 32 bit unix ports to differentiate it from the 16 bit "int" assumed by the existing PDP11 software. Then later "int" was changed to 32 bits on those platforms (and "short" was introduced) and "long" and "int" became synonyms, which they were for a very long time.
Now, on 64 bit unix-like platforms (Linux, the BSDs, OS X, iOS and whatever proprietary unixes people might still care about) "long" is a 64 bit quantity. But, sadly, not on windows: there was too much legacy "code" in the existing headers that made the sizeof(int)==sizeof(long) assumption, so they went with an abomination called "LLP64" and left long as 32 bits. Sigh.
But "size_t" isn't like that. It has always meant precisely one thing: it's the unsigned type that stores the native pointer size in the address space. If you have an unsigned (! -- use ssize_t or ptrdiff_t if you need signed arithmetic) pointer that needs an integer representation (i.e. you need to store the memory size of an object), this is what you use.
It's not a problem now, but it may be in the future depending on where you'll port your app. That's because size_t is defined to be large enough to store offsets of pointers, so if you have a 64-bit pointer, size_t will be 64 bits too. Now, long may or may not be 64 bits, because the size rules for fundamental types in C/C++ give room to some variations.
But if you're to write these values to a file, you have to choose a specific size anyway, so there's no option other than convert to long (or long long, if needed). Better yet, use one of the new size-specific types like int32_t.
My advice: somewhere in the header of your file, store the sizeof for the type you converted the size_t to. By doing that, if in the future you decide to use a larger one, you can still support the old size. And for the current version of the program, you can check if the size is supported or not, and issue an error if not.
Is there any disadvantage of converting size_t to long?
Theoretically long can be smaller than size_t. Also, long is signed. size_t is unsigned. So if you start using them both in same expression, compiler like g++ will complain about it. A lot. Theoretically it might lead to unexpected errors due to signed-to-unsigned assignments.
obviously there is going to be some conversion or addition of long
I don't see why there's supposed to be some conversion or addition to long. You can keep using size_t for all arithmetical operations. You can typedef it as "ListIndex" or whatever and keep using it throughout the code. If you mix types (long and size_t), g++/mignw will nag you to death about it.
Alternatively, you could select specific type which has guaranteed size. Newer compilers have cstdint header which includes types like uint64_t (it is extremely unlikely that you encounter file larger than 2^64, for example). If your compiler doesn't have the header, it should be available in boost.

Explanation for Why int should be preferred over long/short in c++?

I read somewhere that int data type gives better performance (as compared to long and short) regardless of the OS as its size gets modified according to the word size of the OS. Where as long and short occupy 4 and 2 bytes which may or may not match with the word size of OS.
Could anyone give a good explanation of this?
From the standard:
3.9.1, §2 :
There are five signed integer types :
"signed char", "short int", "int",
"long int", and "long long int". In
this list, each type provides at least
as much storage as those preceding it
in the list. Plain ints have the
natural size suggested by the
architecture of the execution
environment (44); the other signed
integer types are provided to meet
special needs.
So you can say char <= short <= int <= long <= long long.
But you cannot tell that a short is 2 byte and a long 4.
Now to your question, most compiler align the int to the register size of their target platform which make alignment easier and access on some platforms faster. But that does not mean that you should prefer int.
Take the data type according to your needs. Do not optimize without performance measure.
int is traditionally the most "natural" integral type for the machine on which the program is to run. What is meant by "most natural" is not too clear, but I would expect that it would not be slower than other types. More to the point, perhaps, is that there is an almost universal tradition for using int in preference to other types when there is no strong reason for doing otherwise. Using other integral types will cause an experienced C++ programmer, on reading the code, to ask why.
short only optimize storage size; calculations always extend to an int, if applicable (i.e. unless short is already same size)
not sure that int should be preferred to longs; the obvious case being when int's capacity doesn't suffice
You already mention native wordsize, so I'll leave that
Eskimos reportedly use forty or more different words for snow. When you only want to communicate that it's snow, then the word "snow" suffices. Source code is not about instructing the compiler: it's about communicating between humans, even if the communication may only be between your current and somewhat later self…
Cheers & hth.
int does not give better performance than the other types. Really, on most modern platforms, all of the integer types will perform similarly, excepting long long. If you want the "fastest" integer available on your platform, C++ does not give you a way to do that.
On the other hand, if you're willing to use things defined by C99, you can use one of the "fastint" types defined there.
Also, on modern machines, memory hierarchy is more important than CPU calculations in most cases. Using smaller integer types lets you fit more integers into CPU cache, which will increase performance in pretty much all cases.
Finally, I would recommend not using int as a default data type. Usually, I see people reach for int when they really want an unsigned integer instead. The conversion from signed to unsigned can lead to subtle integer overflow bugs, which can lead to security vulnerabilities.
Don't choose the data type because of an intrinsic "speed" -- choose the right data type to solve the problem you're looking to solve.

How do all the different size types relate to each other?

Currently I have a scenario where I want to check whether writing a given string to a filestream will grow the file beyond a given size (this is used for logfile rotation). Now, std::ofstream::tellp() returns a streampos, but std::string::size() returns a size_t. The effect is, that this does not work:
out_stream.tellp() + string.size() < limit
because apparently there is an ambiguous overload of operator + for these types. This leads me to two questions:
How can I resolve the above ambiguity?
How do all the different types (size_t, streamsize, streampos, streamoff) relate to each other? When can they be safely converted, and what are possible pitfalls. I am generally confused about these types. All I know is that they are implementation dependent, and that they make certain guarantees (e.g. size_t is always large enough to hold the size of the larges object that would fit into memory on the architecture for which the application was compiled), but what are the guarantees concerning interoperability of these types (see example above, or comparing a streamsize to a size_t)?
You should be able to convert the result from tellp to a std::string::size_type by casting.
static_cast<std::string::size_type>(out_stream.tellp()) + string.size() < limit
EDIT: This is safe because your stream offset will never be negative and will safely convert to an unsigned value.
The real question is: what is the type of limit? The usual way of
testing if there is still room is usually:
limit - out_stream.tellp() >= string.size()
But you have to ensure that limit has a type from which
out_stream.tellp() can be subtracted.
In theory, streampos isn't convertable nor comparable to an integral
type, or that, converted to an integral type, it gives significant
information. And it needed support subtraction, or comparison, for that
matter. In practice, I don't think you have to worry too much about the
conversion to an integral type existing, and being monotonic (although
perhaps on some exotic mainframe...). But you can't be sure that
arithmetic with it will work, so I'd probably prefer converting it
explicitly to a streamsize (which is guaranteed to be a signed integral
type). (Regardless of how you approach the problem, you'll have to deal
with the fact that string.size() returns a size_t, which is required to
be unsigned, whereas streamsize is required to be signed.)
With regards to your second question:
size_t is a typedef to an unsigned integral type, large enough to
specify the size of any possible object,
streamsize is a typedef to a signed integral type, large enough to
specify the size of an "object" in a stream,
streamoff is a typedef to an integral type capable of specifying
the position of a byte in a file, and
streampos is a typedef to fpos, where
something is a type which can be used to maintain the state in
the case of a multibyte stream.
The standard makes very few requirements concerning the relationships
between them (and some of the few it makes are mathematically impossible
to realize), so you're pretty much on your own.
I believe the standard says that streamsize is implementation-specific, so no help there. For a practical answer, you can check the headers where these are typedefed.
Considering that size_t might be 4 bytes while your application could conceivably operate on a stream of more than 4GB length, I believe that you should cast to a known-good-size type for interoperating for an airtight solution.
Of course, if you know (maybe with a compile-time assertion) that size_t or streamsize is 8 bytes long, you can use that type directly. If you have a stream whose length doesn't fit in 8 bytes, you have more serious problems than casting to the right type.
If you have big sizes, isn't unsigned long long the best you can get. If that isn't big enough, what else is?

Why is uint_8 etc. used in C/C++?

I've seen some code where they don't use primitive types int, float, double etc. directly.
They usually typedef it and use it or use things like
uint_8 etc.
Is it really necessary even these days? Or is C/C++ standardized enough that it is preferable to use int, float etc directly.
Because the types like char, short, int, long, and so forth, are ambiguous: they depend on the underlying hardware. Back in the days when C was basically considered an assembler language for people in a hurry, this was okay. Now, in order to write programs that are portable -- which means "programs that mean the same thing on any machine" -- people have built special libraries of typedefs and #defines that allow them to make machine-independent definitions.
The secret code is really quite straight-forward. Here, you have uint_8, which is interpreted
u for unsigned
int to say it's treated as a number
_8 for the size in bits.
In other words, this is an unsigned integer with 8 bits (minimum) or what we used to call, in the mists of C history, an "unsigned char".
uint8_t is rather useless, because due to other requirements in the standard, it exists if and only if unsigned char is 8-bit, in which case you could just use unsigned char. The others, however, are extremely useful. int is (and will probably always be) 32-bit on most modern platforms, but on some ancient stuff it's 16-bit, and on a few rare early 64-bit systems, int is 64-bit. It could also of course be various odd sizes on DSPs.
If you want a 32-bit type, use int32_t or uint32_t, and so on. It's a lot cleaner and easier than all the nasty legacy hacks of detecting the sizes of types and trying to use the right one yourself...
Most code I read, and write, uses the fixed-size typedefs only when the size is an important assumption in the code.
For example if you're parsing a binary protocol that has two 32-bit fields, you should use a typedef guaranteed to be 32-bit, if only as documentation.
I'd only use int16 or int64 when the size must be that, say for a binary protocol or to avoid overflow or keep a struct small. Otherwise just use int.
If you're just doing "int i" to use i in a for loop, then I would not write "int32" for that. I would never expect any "typical" (meaning "not weird embedded firmware") C/C++ code to see a 16-bit "int," and the vast majority of C/C++ code out there would implode if faced with 16-bit ints. So if you start to care about "int" being 16 bit, either you're writing code that cares about weird embedded firmware stuff, or you're sort of a language pedant. Just assume "int" is the best int for the platform at hand and don't type extra noise in your code.
The sizes of types in C are not particularly well standardized. 64-bit integers are one example: a 64-bit integer could be long long, __int64, or even int on some systems. To get better portability, C99 introduced the <stdint.h> header, which has types like int32_t to get a signed type that is exactly 32 bits; many programs had their own, similar sets of typedefs before that.
C and C++ purposefully don't define the exact size of an int. This is because of a number of reasons, but that's not important in considering this problem.
Since int isn't set to a standard size, those who want a standard size must do a bit of work to guarantee a certain number of bits. The code that defines uint_8 does that work, and without it (or a technique like it) you wouldn't have a means of defining an unsigned 8 bit number.
The width of primitive types often depends on the system, not just the C++ standard or compiler. If you want true consistency across platforms when you're doing scientific computing, for example, you should use the specific uint_8 or whatever so that the same errors (or precision errors for floats) appear on different machines, so that the memory overhead is the same, etc.
C and C++ don't restrict the exact size of the numeric types, the standards only specify a minimum range of values that has to be represented. This means that int can be larger than you expect.
The reason for this is that often a particular architecture will have a size for which arithmetic works faster than other sizes. Allowing the implementor to use this size for int and not forcing it to use a narrower type may make arithmetic with ints faster.
This isn't going to go away any time soon. Even once servers and desktops are all fully transitioned to 64-bit platforms, mobile and embedded platforms may well be operating with a different integer size. Apart from anything else, you don't know what architectures might be released in the future. If you want your code to be portable, you have to use a fixed-size typedef anywhere that the type size is important to you.