C++, best practices, int or size_t? [duplicate]

C++, best practices, int or size_t? [duplicate] - c++

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
When to use std::size_t?
hello.
Assuming usage patterns are the same (i.e. no negative numbers), which is preferable to use for various indexes, int or size_t type?
Is there performance difference in your experience on 64-bit Intel between the two?
Thank you

size_t is the type that should be used for array indexing when you work with a relatively generic arrays. I.e. when you have just an array of abstract chars, ints or something else.
When you are working with a specific array, i.e. an array that contains some elements specific for your application, you should normally already have a "type of choice" to count or to index the entities of that type in your application. That's the type you should use. For example, if some array contains the records for company employees, then you should already have a "type of choice" in your program that you use to designate the "quantity of employees". That's the type you should use for indexing arrays of employee records. It could be unsigned int, it could be employee_count_t or something like that. Using a naked size_t for that purpose is a design error.
Note also, that size_t is a type not immediately intended for array indexing. It is a type intended to represent the size of the largest object in the program. It "works" for arrays by transitivity: arrays are objects, hence size_t is always enough to index an array. However, when you design a program it makes more sense to think in terms of generic containers, instead of thinking in terms of specific arrays. Today it might be an array, tomorrow you might have to switch to a linked list or a tree instead. In general case, the range of size_t is not sufficient to represent the number of elements in an abstract container, which is why size_t in such cases is not a good choice.

It depends on what you are doing. If you are iterating over a vector, then use std::size_t:
for (std::size_t i = 0; i < vec.size(); i++) {
// do something with vec[i]
}
However, beware of coding errors such as:
for (std::size_t i = 99; i >= 0; i--) {
// This is an infinite loop
}
If you are just doing a loop, you might want to use just a plain int because of the situation above. There should be no performance difference between using int and std::size_t. If you need an exact size, then you should use neither int nor size_t, but rather the types defined in stdint.h.

The types aren't different in the sense you're implying, and generally int is 32bits, and size_t is the width of the platform word (32-64 bits). I'd suggest you use size_t when dealing with files, buffers, and anything else that might describe an area of memory or a buffer.
Furthermore you should note that int is signed, while size_t is not.
Finally, int was historically used where size_t should be used now. However int is still useful in it's own right for other purposes.

size_t or ptrdiff_t. int might not be enough to access all the elements of an array.

Related

Authoritative "correct" way to avoid signed-unsigned warnings when testing a loop variable against size_t

The code below generates a compiler warning:
private void test()
{
byte buffer[100];
for (int i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}
warning: comparison between signed and unsigned integer expressions
[-Wsign-compare]
This is because sizeof() returns a size_t, which is unsigned.
I have seen a number of suggestions for how to deal with this, but none with a preponderance of support and none with any convincing logic nor any references to support one approach as clearly "better." The most common suggestions seem to be:
ignore the warnings
turn off the warnings
use a loop variable of type size_t
use a loop variable of type size_t with tricks to avoid decrementing past zero
cast size_of(buffer) to an int
some extremely convoluted suggestions that I did not have the patience to follow because they involved unreadable code, generally involving vectors and/or iterators
libraries that I cannot load in the AVR / ARM embedded environments I often use.
free functions returning a valid int or long representing the byte count of T
Don't use loops (gotta love that advice)
Is there a "correct" way to approach this?
-- Begin Edit --
The example I gave is, of course, trivial, and meant only to demonstrate the type mismatch warning that can occur in an indexing situation.
#3 is not necessarily the obviously correct answer because size_t carries special risks in a decrementing loop such as
for (size_t i = myArray.size; i > 0; --i)
(the array may someday have a size of zero).
#4 is a suggestion to deal with decrementing size_t indexes by including appropriate and necessary checks to avoid ever decrementing past zero. Since that makes the code harder to read, there are some cute shortcuts that are not particularly readable, hence my referring to them as "tricks."
#7 is a suggestion to use libraries that are not generalizable in the sense that they may not be available or appropriate in every setting.
#8 is a suggestion to keep the checks readable, but to hide them in a non-member method, sometimes referred to as a "free function."
#9 is a suggestion to use algorithms rather than loops. This was offered many times as a solution to the size_t indexing problem, and there were a lot of upvotes. I include it even though I can't use the stl library in most of my environments and would have to write the code myself.
-- End Edit--
I am hoping for evidence-based guidance or references as to best practices for handling something like this. Is there a "standard text" or a style guide somewhere that addresses the question? A defined approach that has been adopted/endorsed internally by a major tech company? An emulatable solution forthcoming in a new language release? If necessary, I would be satisfied with an unsupported public recommendation from a single widely recognized expert.
None of the options on offer seem very appealing. The warnings drown out other things I want to see. I don't want to miss signed/unsigned comparisons in places where it might matter. Decrementing a loop variable of type size_t with comparison >=0 results in an infinite loop from unsigned integer wraparound, and even if we protect against that with something like for (size_t i = sizeof(buffer); i-->0 ;), there are other issues with incrementing/decrementing/comparing to size_t variables. Testing against size_t - 1 will yield a large positive 'oops' number when size_t is unexpectedly zero (e.g. strlen(myEmptyString)). Casting an unsigned size_t to an integer is a container size problem (not guaranteed a value) and of course size_t could potentially be bigger than an int.
Given that my arrays are of known sizes well below Int_Max, it seems to me that casting size_t to a signed integer is the best of the bunch, but it makes me cringe a little bit. Especially if it has to be static_cast<int>. Easier to take if it's hidden in a function call with some size testing, but still...
Or perhaps there's a way to turn off the warnings, but just for loop comparisons?

I find any of the three following approaches equally good.
Use a variable of type int to store the size and compare the loop variable to it.
byte buffer[100];
int size = sizeof(buffer);
for (int i = 0; i < size; ++i)
{
buffer[i] = 0;
}
Use size_t as the type of the loop variable.
byte buffer[100];
for (size_t i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
Use a pointer.
byte buffer[100];
byte* end = buffer + sizeof(buffer)
for (byte* p = buffer; p < end; ++p)
{
*p = 0;
}
If you are able to use a C++11 compiler, you can also use a range for loop.
byte buffer[100];
for (byte& b : buffer)
{
b = 0;
}

The most appropriate solution will depend entirely on context. In the context of the code fragment in your question the most appropriate action is perhaps to have type-agreement - the third option in your bullet list. This is appropriate in this case because the usage of i throughout the code is only to index the array - in this case the use of int is inappropriate - or at least unnecessary.
On the other hand if i were an arithmetic object involved in some arithmetic expression that was itself signed, the int might be appropriate and a cast would be in order.
I would suggest that as a guideline, a solution that involves the fewest number of necessary type casts (explicit of implicit) is appropriate, or to look at it another way, the maximum possible type agreement. There is not one "authoritative" rule because the purpose and usage of the variables involved is semantically rather then syntactically dependent. In this case also as has been pointed out in other answers, newer language features supporting iteration may avoid this specific issue altogether.
To discuss the advice you say you have been given specifically:
ignore the warnings
Never a good idea - some will be genuine semantic errors or maintenance issues, and by teh time you have several hundred warnings you are ignoring, how will you spot the one warning that is and issue?
turn off the warnings
An even worse idea; the compiler is helping you to improve your code quality and reliability. Why would you disable that?
use a loop variable of type size_t
In this precise example, that is exactly why you should do; exact type agreement should always be the aim.
use a loop variable of type size_t with tricks to avoid decrementing past zero
This advice is irrelevant for the trivial example given. Moreover I presume that by "tricks" the adviser in fact means checks or just correct code. There is no need for "tricks" and the term is entirely ambiguous - who knows what the adviser means? It suggests something unconventional and a bit "dirty", when there is not need for any solution with such attributes.
cast size_of(buffer) to an int
This may be necessary if the usage of i warrants the use of int for correct semantics elsewhere in the code. The example in the question does not, so this would not be an appropriate solution in this case. Essentially if making i a size_t here causes type agreement warnings elsewhere that cannot themselves be resolved by universal type agreement for all operands in an expression, then a cast may be appropriate. The aim should be to achieve zero warnings an minimum type casts.
some extremely convoluted suggestions that I did not have the patience to follow, generally involving vectors and/or iterators
If you are not prepared to elaborate or even consider such advice, you'd have better omitted the "advice" from your question. The use of STL containers in any case is not always appropriate to a large segment of embedded targets in any case, excessive code size increase and non-deterministic heap management are reasons to avoid on many platforms and applications.
libraries that I cannot load in an embedded environment.
Not all embedded environments have equal constraints. The restriction is on your embedded environment, not by any means all embedded environments. However the "loading of libraries" to resolve or avoid type agreement issues seems like a sledgehammer to crack a nut.
free functions returning a valid int or long representing the byte count of T
It is not clear what that means. What id a "free function"? Is that just a non-member function? Such a function would internally necessarily have a type case, so what have you achieved other than hiding a type cast?
Don't use loops (gotta love that advice).
I doubt you needed to include that advice in your list. The problem is not in any case limited to loops; it is not because you are using a loop that you have the warning, it is because you have used < with mismatched types.

My favorite solution is to use C++11 or newer and skip the whole manual size bounding entirely like so:
// assuming byte is defined by something like using byte = std::uint8_t;
void test()
{
byte buffer[100];
for (auto&& b: buffer)
{
b = 0;
}
}
Alternatively, if I can't use the ranged-based for loop (but still can use C++11 or newer), my favorite syntax becomes:
void test()
{
byte buffer[100];
for (auto i = decltype(sizeof(buffer)){0}; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}
Or for iterating backwards:
void test()
{
byte buffer[100];
// relies on the defined modwrap semantics behavior for unsigned integers
for (auto i = sizeof(buffer) - 1; i < sizeof(buffer); --i)
{
buffer[i] = 0;
}
}

The correct generic way is to use a loop iterator of type size_t. Simply because the is the most correct type to use for describing an array size.
There is not much need for "tricks to avoid decrementing past zero", because the size of an object can never be negative.
If you find yourself needing negative numbers to describe a variable size, it is probably because you have some special case where you are iterating across an array backwards. If so, the "trick" to deal with it is this:
for(size_t i=0; i<sizeof(array); i++)
{
size_t index = sizeof(array)-1 - i;
array[index] = something;
}
However, size_t is often an inconvenient type to use in embedded systems, because it may end up as a larger type than what your MCU can handle with one instruction, resulting in needlessly inefficient code. It may then be better to use a fixed width integer such as uint16_t, if you know the maximum size of the array in advance.
Using plain int in an embedded system is almost certainly incorrect practice. Your variables must be of deterministic size and signedness - most variables in an embedded system are unsigned. Signed variables also lead to major problems whenever you need to use bitwise operators.

If you are able to use C++ 11, you could use decltype to obtain the actual type of what sizeof returns, for instance:
void test()
{
byte buffer[100];
// On macOS decltype(sizeof(buffer)) returns unsigned long, this passes
// the compiler without warnings.
for (decltype(sizeof(buffer)) i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}

int vs unsigned int vs size_t

Starting with a more advanced C++ course, we have to implement an own Matrix, which is typical for first exercises. We received a skeleton to work on and i have got only one question left. The type of the access and size variables.
Here a simple constructor for 1D Matrix, with some Checking of the size.
Array::Array( int xSize )
{
CHECK_MSG(xSize > 0, "Array size too small");
array_ = new real[xSize];
size_ = xSize;
}
Does it make sense to use a size_t or unsigned int instead of an int? After reading the definition of size_t i would tend to use it instead. However in many codes i see just ints everywhere. Is it a java-like coding style? Has size_t any disadvantages i missed?
Edit:
The main question relates to the coding style. I fully understand the difference of size_t and (unsigned) int, as it was already explained here:unsigned-int-vs-size-t

The C++ standard library would almost certainly use a std::size_t for such a type.
Using a signed type is obviously not desirable and, ideally, you want to use a type that lends itself well to having an object that supports iterability.
From the outset I recommend you use typedef std::size_t MySize; within your class, mainly to future-proof yourself. That would be the most sensible choice.

Can the size of anything ever be negative? No, so use unsigned for xSize to express the intent of the code more explicitly.
If you need it be able to handle really big sizes, use size_t.

difference between size_type and int

#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<double> student_marks(20);
for (vector<double>::size_type i = 0; i < 20; i++)
{
cout << "Enter marks for student #" << i+1
<< ": " << flush;
cin >> student_marks[i];
}
return 0;
}
I read somewhere that it is better to use size_type in place of int . Does it really make a huge impact on the implementation and what are the positives of using size_type ?

vector<double>::size_type is guaranteed to cover the full range of possible values of the size of a vector<double>. An int is not.
Note that vector<double>::size_type is usually the same as std::size_t, so in general it would be OK to use the latter. However, custom allocators can result in the a vector with a size_type different to std::size_t.

size_type is an unsigned number. This means it cannot be negative. Which may sound a logical choice for container sizes, but in real life creates a lot of problems.
For example, when you substract one size_type from another, the result will silently be converted to a huge positive number if the second operand is greater than the first:
std::vector<double>::size_type size1 = v1.size();
std::vector<double>::size_type size2 = v2.size();
if ((size1 - size2) > 5) // dangerous if v2 is bigger than v1!
{
// ...
}
And it often makes error checking impossible:
void f(std::vector<double>::size_type size)
{
if (size < 0)
{
// error handling will never be reached
}
}
f(-1); // no error handling
It is for such problems that you usually prefer (signed) int to unsigned in your code, and it is unfortunate that standard library containers express their sizes as unsigned numbers. You will often want to convert (static_cast) sizes to signed numbers as soon as possible. In your example, it's fine to stay unsigned, because it's just the loop index, and casting to int first would be over-engineering IMO.
The good news is that you can often avoid size_type altogether by using iterators:
for (std::vector<double>::const_iterator iter = v.begin(); iter != v.end(); ++iter)
{
std::cout << *iter;
}
I invite you to have a look at Signed and Unsigned Types in Interfaces by Scott Meyers.
Finally, please note that this point of view is not universally shared by experienced C++ programmers. Searching for past discussions about this topic on the web (old Stackoverflow questions, comp.lang.c++ Usenet archive etc) will reveal different opinions.

The positives of using size_type, as opposed to size_t, is just conformity. It pleases those who learn by rote and reason purely by association. They do like they see others like themselves do, and want to have that be perceived as “right”.
The size_type member is a customization point for the container’s size type. It comes from the allocator template parameter. With the default standard allocator it's just a synonym for size_t, i.e. these are, by default, the same type.
The problem with int, as opposed to the signed type ptrdiff_t (the result type for a pointer difference), is that int may not have sufficient range for a really huge vector on a 64-bit system.
The problem with size_t is that it can easily cause inadvertent use of modulo arithmetic in mixed type expressions. For example, the expression std::string( "Hi!" ).length() < -7 will always yield true. Thus it's a bug-attractor, so ptrdiff_t, or just plain int when you're sure of the range, is very much objectively preferable (even though it may displease the aforementioned folks enormously – there is still a very strong social pressure to use the objectively inferior solution).
In the standard library the unsigned type size_t is used for historical reasons.
In particular, on 16-bit systems memory was a very limited resource, and one had to use every little dirty trick to make use of absolutely all of it – for those small systems, in those days, it was well worth the risk of some wrap-around bugs etc. Today it isn't.

Compare with size_t, return int?

I'm writing some code examples from "How to Think Like a Computer Scientist in C++", and this one is about handling playing-card type objects and decks. I'm facing this situation:
int Card::find(const std::vector<Card>& deck) const {
size_t deckSize = deck.size();
for (size_t i=0; i<deckSize; i++)
if (equals(*this, deck[i])) return i;
return -1;
}
I couldn't use ".length()" on a vector in C++ in Visual Studio 2010 as in the text, and instead had to use .size() which returns (I believe) std::size_type. I figured I could use size_t and get away with it in order to avoid problems on different architectures, as I've been reading, but I'm wondering if I return i, but it's bigger than an integer, will I crash the program?
[Edited to be more specific in my question:]
Once I start using vectors for larger things than cards, I considered using unsigned int because of a compiler mismatch warning, but I feel returning an unsigned int or int has a few issues: 1) int will not take a sufficiently large vector index. 2) returning unsigned int will not let me return -1. 3) unsigned int isn't equal to size_t on all architectures (I'm also doing microcontroller programming on an ARM Cortex-M3).
What should I do if I ever have a large enough vector?

Casting from size_t to int will not "crash" your program, but it's a bad bad practice. On the other hand, STL includes nice find algorithm for what you are doing.

int is 32 bit on 32 / 64 bit Windows and Linux. i will get truncated if greater than two at the 31st. you could use unsigned int and your program will be fine unless storing more than 4 G elements in the vector :)

size_t is typically an unsigned int but you can't rely on that. If it's bigger than an int you won't crash, you'll just overflow into a (probably negative) number.
Assuming you're not going to have several tens of thousands of cards in one vector, I'd be happy returning the int.

You can also return std::pair<size_t, bool>, similar to std::map insert(). Second template argument means success or fail.
If you ok with this, you could also use boost::optional

Where can I look up the definition of size_type for vectors in the C++ STL?

It seems safe to cast the result of my vector's size() function to an unsigned int. How can I tell for sure, though? My documentation isn't clear about how size_type is defined.

Do not assume the type of the container size (or anything else typed inside).
Today?
The best solution for now is to use:
std::vector<T>::size_type
Where T is your type. For example:
std::vector<std::string>::size_type i ;
std::vector<int>::size_type j ;
std::vector<std::vector<double> >::size_type k ;
(Using a typedef could help make this better to read)
The same goes for iterators, and all other types "inside" STL containers.
After C++0x?
When the compiler will be able to find the type of the variable, you'll be able to use the auto keyword. For example:
void doSomething(const std::vector<double> & p_aData)
{
std::vector<double>::size_type i = p_aData.size() ; // Old/Current way
auto j = p_aData.size() ; // New C++0x way, definition
decltype(p_aData.size()) k; // New C++0x way, declaration
}
Edit: Question from JF
What if he needs to pass the size of the container to some existing code that uses, say, an unsigned int? – JF
This is a problem common to the use of the STL: You cannot do it without some work.
The first solution is to design the code to always use the STL type. For example:
typedef std::vector<int>::size_type VIntSize ;
VIntSize getIndexOfSomeItem(const std::vector<int> p_aInt)
{
return /* the found value, or some kind of std::npos */
}
The second is to make the conversion yourself, using either a static_cast, using a function that will assert if the value goes out of bounds of the destination type (sometimes, I see code using "char" because, "you know, the index will never go beyond 256" [I quote from memory]).
I believe this could be a full question in itself.

According to the standard, you cannot be sure. The exact type depends on your machine. You can look at the definition in your compiler's header implementations, though.

I can't imagine that it wouldn't be safe on a 32-bit system, but 64-bit could be a problem (since ints remain 32 bit). To be safe, why not just declare your variable to be vector<MyType>::size_type instead of unsigned int?

It should always be safe to cast it to size_t. unsigned int isn't enough on most 64-bit systems, and even unsigned long isn't enough on Windows (which uses the LLP64 model instead of the LP64 model most Unix-like systems use).

The C++ standard only states that size_t is found in <cstddef>, which puts the identifiers in <stddef.h>. My copy of Harbison & Steele places the minimum and maximum values for size_t in <stdint.h>. That should give you a notion of how big your recipient variable needs to be for your platform.
Your best bet is to stick with integer types that are large enough to hold a pointer on your platform. In C99, that'd be intptr_t and uintptr_t, also officially located in <stdint.h>.

As long as you're sure that an unsigned int on your system will be large enough to hold the number of items you'll have in the vector you should be safe ;-)

I'm not sure how well this will work because I'm just thinking off the top of my head, but a compile-time assertion (such as BOOST_STATIC_ASSERT() or see Ways to ASSERT expressions at build time in C) might help. Something like:
BOOST_STATIC_ASSERT( sizeof( unsigned int) >= sizeof( size_type));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js