What are the arguments against using size_t?

What are the arguments against using size_t? - c++

I have a API like this,
class IoType {
......
StatusType writeBytes(......, size_t& bytesWritten);
StatusType writeObjects(......, size_t& objsWritten);
};
A senior member of the team who I respect seems to have a problem with the type size_t and suggest that I use C99 types. I know it sounds stupid but I always think c99 types like uint32_t and uint64_t look ugly. I do use them but only when it's really necessary, for instance when I need to serialize/deserialize a structure, I do want to be specific about the sizes of my data members.
What are the arguments against using size_t? I know it's not a real type but if I know for sure even a 32-bit integer is enough for me and a size type seems to be appropriate for number of bytes or number of objects, etc.

Use exact-size types like uint32_t whenever you're dealing with serialization of any sort (binary files, networking, etc.). Use size_t whenever you're dealing with the size of an object in memory—that's what it's intended for. All of the functions that deal with object sizes, like malloc, strlen, and the sizeof operator all size_t.
If you use size_t correctly, your program will be maximally portable, and it will not waste time and memory on platforms where it doesn't need to. On 32-bit platforms, a size_t will be 32 bits—if you instead used a uint64_t, you'd waste time and space. Conversely, on 64-bit platforms, a size_t will be 64 bits—if you instead used a uint32_t, your program could behave incorrectly (maybe even crash or open up a security vulnerability) if it ever had to deal with a piece of memory larger than 4 GB.

I can't think of anything wrong in using size_t in contexts where you don't need to serialize values. Also using size_t correctly will increase the code's safety/portability across 32 and 64 bit patforms.

Uhm, it's not a good idea to replace size_t (a maximally portable thing) with a less portable C99 fixed size or minimum size unsigned type.
On the other hand, you can avoid a lot of technical problems (wasted time) by using the signed ptrdiff_t type instead. The standard library’s use of unsigned type is just for historical reasons. It made sense in its day, and even today on 16-bit architectures, but generally it is nothing but trouble & verbosity.
Making that change requires some support, though, in particular a general size function that returns array or container size as ptrdiff_t.
Now, regarding your function signature
StatusType writeBytes(......, size_t& bytesWritten);
This forces the calling code’s choice of type for the bytes written count.
And then, with unsigned type size_t forced, it is easy to introduce a bug, e.g. by checking if that is less or more than some computed quantity.
A grotesque example: std::string("ah").length() < -5 is guaranteed true.
So instead, make that …
Size writeBytes(......);
or, if you do not want to use exceptions,
Size writeBytes(......, StatusType& status );
It is OK to have an enumeration of possible statuses as unsigned type or as whatever, because the only operations on status values will be equality checking and possibly as keys.

Related

Why is std::ssize being forced to a minimum size for its signed size type?

In C++20, std::ssize is being introduced to obtain the signed size of a container for generic code. (And the reason for its addition is explained here.)
Somewhat peculiarly, the definition given there (combining with common_type and ptrdiff_t) has the effect of forcing the return value to be "either ptrdiff_t or the signed form of the container's size() return value, whichever is larger".
P1227R1 indirectly offers a justification for this ("it would be a disaster for std::ssize() to turn a size of 60,000 into a size of -5,536").
This seems to me like an odd way to try to "fix" that, however.
Containers which intentionally define a uint16_t size and are known to never exceed 32,767 elements will still be forced to use a larger type than required.
The same thing would occur for containers using a uint8_t size and 127 elements, respectively.
In desktop environments, you probably don't care; but this might be important for embedded or otherwise resource-constrained environments, especially if the resulting type is used for something more persistent than a stack variable.
Containers which use the default size_t size on 32-bit platforms but which nevertheless do contain between 2B and 4B items will hit exactly the same problem as above.
If there still exist platforms for which ptrdiff_t is smaller than 32 bits, they will hit the same problem as well.
Wouldn't it be better to just use the signed type as-is (without extending its size) and to assert that a conversion error has not occurred (eg. that the result is not negative)?
Am I missing something?
To expand on that last suggestion a bit (inspired by Nicol Bolas' answer): if it were implemented the way that I suggested, then this code would Just Work™:
void DoSomething(int16_t i, T const& item);
for (int16_t i = 0, len = std::ssize(rng); i < len; ++i)
{
DoSomething(i, rng[i]);
}
With the current implementation, however, this produces warnings and/or errors unless static_casts are explicitly added to narrow the result of ssize, or to use int i instead and then narrow it in the function call (and the range indexing), neither of which seem like an improvement.

Containers which intentionally define a uint16_t size and are known to never exceed 32,767 elements will still be forced to use a larger type than required.
It's not like the container is storing the size as this type. The conversion happens via accessing the value.
As for embedded systems, embedded systems programmers already know about C++'s propensity to increase the size of small types. So if they expect a type to be an int16_t, they're going to spell that out in the code, because otherwise C++ might just promote it to an int.
Furthermore, there is no standard way to ask about what size a range is "known to never exceed". decltype(size(range)) is something you can ask for; sized ranges are not required to provide a max_size function. Without such an ability, the safest assumption is that a range whose size type is uint16_t can assume any size within that range. So the signed size should be big enough to store that entire range as a signed value.
Your suggestion is basically that any ssize call is potentially unsafe, since half of any size range cannot be validly stored in the return type of ssize.
Containers which use the default size_t size on 32-bit platforms but which nevertheless do contain between 2B and 4B items will hit exactly the same problem as above.
Assuming that it is valid for ptrdiff_t to not be a signed 64-bit integer on such platforms, there isn't really a valid solution to that problem. So yes, there will be cases where ssize is potentially unsafe.
ssize currently is potentially unsafe in cases where it is not possible to be safe. Your proposal would make ssize potentially unsafe in all cases.
That's not an improvement.
And no, merely asserting/contract checking is not a viable solution. The point of ssize is to make for(int i = 0; i < std::ssize(rng); ++i) work without the compiler complaining about signed/unsigned mismatch. To get an assert because of a conversion failure that didn't need to happen (and BTW, cannot be corrected without using std::size, which we are trying to avoid), one which is ultimately irrelevant to your algorithm? That's a terrible idea.
if it were implemented the way that I suggested, then this code would Just Work™:
Let us ignore the question of how often it is that a user would write this code.
The reason your compiler will expect/require you to use a cast there is because you are asking for an inherently dangerous operation: you are potentially losing data. Your code only "Just Works™" if the current size fits into an int16_t; that makes the conversion statically dangerous. This is not something that should implicitly take place, so the compiler suggests/requires you to explicitly ask for it. And users looking at that code get a big, fat eyesore reminding them that a dangerous thing is being done.
That is all to the good.
See, if your suggested implementation were how ssize behaved, then that means we must treat every use of ssize as just as inherently dangerous as the compiler treats your attempted implicit conversion. But unlike static_cast, ssize is small and easily missed.
Dangerous operations should be called out as such. Since ssize is small and difficult to notice by design, it therefore should be as safe as possible. Ideally, it should be as safe as size, but failing that, it should be unsafe only to the extend that it is impossible to make it safe.
Users should not look on ssize usage as something dubious or disconcerting; they should not fear to use it.

What do I declare message length as? int or char?

I am relatively new to coding and would like to know how do I declare message length (mlen) and ciphertext (clen) length in my c++ code. However, I am not too sure what am I suppose to declare them as. (int? char? unsigned long long?)
The formula that was given to me to include in my code is:
*clen = mlen + CRYPTO_ABYTES
Information that was given to me is:
It operates on a state of 320-bits with message blocks of 64 bits.
UPDATE: Sorry for the bad question, I realized I was given unsigned long long for the message length, it was written in a smaller font that I did not realized it.

If there is no strict requirement regarding how big your type has to be, you should use the native C++ type designed to represent sizes:
std::size_t
The main advantages of std::size_t over types such as unsigned int, unsigned short etc is:
It is capable of holding the maximum possible size of any C++ type, including arrays (since it's defined as the result type of the sizeof operator).
As such, it's often optimized to your architecture (e.g. RAM address space).
It is typically used as the type to represent indices and lengths in new[], std::vector, and large parts of the STL in general. As you are likely going to use such data structures and algorithms, you can avoid useless conversions between different types, making your code more readable, without compiler warnings.
The type std::size_t in your code tells you something about its semantics (indices and sizes; unsigned implied). Other integer types such as unsigned int can be used to represent anything.
For additional information, check out cppreference.com.
On the other hand, if you need to pack your data tightly (e.g. for network protocol), you should use the smallest possible type such as std::uint16_t, std::uint32_t etc.

Does "size_t" make code portable, or just 'more' portable?

I'm struggling to understand the usefulness of the C++ std::size_t data type. I realize that this data type is platform dependent, and is supposedly meant to make code more portable. However, it seems like it doesn't solve all the problems.
Say for example I'm working on a machine that has 32 bit int. Let's say I decide to write a c style function of this machine that just copies the bytes from one object to another. Inside this function, the memcpy function is used to write the data from object2 to object1. I've chosen an arbitrarily large number.
void writeBytes(obj *pobj1, obj *pobj2)
{
memcpy(pobj1, pobj2, 1048575);
}
This code should (hopefully) compile just fine. Because memcpy uses size_t in its declaration, and because size_t on this platform should be 32 bits, the number of 1048575 should work just fine.
But now let's say I decide to port this function over to a machine that 16 bit ints. Now the memcpy function interprets size_t as being of size 16. In this case, 1048575 is exceeds the allowed values for what memcpy was declared for. The code then fails to compile.
So my question: how exactly was size_t useful in this case? How did it make our code more portable?

size_t is able to hold the size of the largest object you can create. It is not required to be your platform's largest native integer type.
Given your example, your code would work regardless of the native integer being 16-bit or 32-bit if your platform allows 1048575 byte objects. Or the inverse -- if 1048575 doesn't fit in a size_t, you never could have created an object that large to memcpy.

size_t is susceptible to the same overflow and underflow rules as any other integral type. It's only typedef of an unsigned integral. Its purpose is not to prevent you from assigning or casting values that are outside of its range. It defines a standard type that will:
"store the maximum size of a theoretically possible object of any type
(including array)."
If you care about that maximum size, use
std::numeric_limits<std::size_t>::max()
and make decisions off of that.

I think size_t makes (not only your) code more readable and consistent, not necessarily more portable.
It will help porting it, however.
Imagine functions working with some size use a happy mixture of int, short, long, unsigned, unsigned short, ...
When one such function dealing with a size calls another that needs a size parameter too, it's very helpful when one size_t fits all (or at least most, like for the result of read()).

What are the usage differences between size_t and off_t?

Other than the size of the values that each type can hold, what are the main differences in usage between size_t and off_t? Is it just a convention that size_t types are used for absolute sizes and off_t types are used for offsets? Or does it go deeper than that?
I am writing a wrapper class to enable the writing of large files using mmap and I want to know what the best types are to use for their arguments. Given that I want to write to files > 4GB, I'm tempted to use size_t for everything, but is that the best practice? (or should I be using some off64_t types for certain functions?)
For example, should my writeAt function be declared as:
MMapWriter::writeAt(off64_t offset, const void* src, size_t size)
or
MMapWriter::writeAt(size_t offset, const void* src, size_t size)

size_t is for objects, off_t is for files.
mmap merges the two concepts, pretty much by definition. Personally I think I'd use size_t, since no matter what else it is, a mapped file is also an array in (virtual) memory.
size_t is standard C++, off_t is Posix, and off64_t is a GNU extension that goes with the functions fopen64, ftello64, etc. I think it should always be the same type as off_t on 64 bit GNU systems, but don't bet your company on that without checking.
Should it be relevant, off_t is signed whereas size_t is unsigned. But the signed counterpart to size_t is ptrdiff_t, so when you need a signed type it doesn't automatically mean you should use off_t or off64_t.

size_t is part of the C++ (and C) standards, and refers to the type of a sizeof expression. off_t is defined by the Posix standard, and refers to the size of a file.

Good rule of thumb for this scenario. Use whatever function signature that results in you using the least amount of explicit casting, be it c style or c++ style. If you have to cast, c++ style is safer as it is more limited in the types of cast it can do.
The benefit of this is if you port to a platform where the types don't match up (whether it be a signed, size or endianness issue etc.), you should catch most of the bugs at compile time, rather than at runtime. Casting is the sledgehammer approach for squeezing a triangular shaped object into a round hole (more or less your telling the compiler to keep quiet, I know what I'm doing).
Trying to find casting issues at runtime can be a pain as it can be hard to reproduce. It's better to find issues at compile-time rather than runtime. The compiler is your friend.

guidelines on usage of size_t and offset_t?

This is probably a C++ 101 question: I'm curious what the guidelines are for using size_t and offset_t, e.g. what situations they are intended for, what situations they are not intended for, etc. I haven't done a lot of portable programming, so I have typically just used something like int or unsigned int for array sizes, indexes, and the like. However, I gather it's preferable to use some of these more standard typedefs when possible, so I'd like to know how to do that properly.
As a follow-up question, for development on Windows using Visual Studio 2008, where should I look to find the actual typedefs? I've found size_t defined in a number of headers within the VS installation directory, so I'm not sure which of those I should use, and I can't find offset_t anywhere.

You are probably referring to off_t, not offset_t. off_t is a POSIX type, not a C type, and it is used to denote file offsets (allowing 64-bit file offsets even on 32-bit systems). C99 has superceded that with fpos_t.
size_t is meant to count bytes or array elements. It matches the address space.

Instead of offset_t do you mean ptrdiff_t? This is the type returned by such routines as std::distance. My understanding is that size_t is unsigned (to match the address space as previously mentioned) whereas ptrdiff_t is signed (to theoretically denote "backwards" distances between pointers, though this is very rarely used).

offset_t isn't mentioned at all in my copy of the C++ standard.
size_t on the other hand, is meant simply to denote object or array sizes. A size_t is guaranteed to be big enough to store the size of any in-memory object, so it should be used for that.

You use size_t whenever the C++ language specification indicates that it is used in a particular expression. For example, you'd use size_t to store the return value of sizeof, or to represent the number of elements in an array (new[] takes size_t).
I've no idea what offset_t is - it's not mentioned once in ISO C++ spec.

Dragging an old one back up... offset_t (which is a long long) is used by Solaris' llseek() system call, and is to be used anywhere you're likely to run into real 64-bit file offsets... meaning you're working with really huge files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js