int vs unsigned int vs size_t - c++

Starting with a more advanced C++ course, we have to implement an own Matrix, which is typical for first exercises. We received a skeleton to work on and i have got only one question left. The type of the access and size variables.
Here a simple constructor for 1D Matrix, with some Checking of the size.
Array::Array( int xSize )
{
CHECK_MSG(xSize > 0, "Array size too small");
array_ = new real[xSize];
size_ = xSize;
}
Does it make sense to use a size_t or unsigned int instead of an int? After reading the definition of size_t i would tend to use it instead. However in many codes i see just ints everywhere. Is it a java-like coding style? Has size_t any disadvantages i missed?
Edit:
The main question relates to the coding style. I fully understand the difference of size_t and (unsigned) int, as it was already explained here:unsigned-int-vs-size-t

The C++ standard library would almost certainly use a std::size_t for such a type.
Using a signed type is obviously not desirable and, ideally, you want to use a type that lends itself well to having an object that supports iterability.
From the outset I recommend you use typedef std::size_t MySize; within your class, mainly to future-proof yourself. That would be the most sensible choice.

Can the size of anything ever be negative? No, so use unsigned for xSize to express the intent of the code more explicitly.
If you need it be able to handle really big sizes, use size_t.

Related

Authoritative "correct" way to avoid signed-unsigned warnings when testing a loop variable against size_t

The code below generates a compiler warning:
private void test()
{
byte buffer[100];
for (int i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}
warning: comparison between signed and unsigned integer expressions
[-Wsign-compare]
This is because sizeof() returns a size_t, which is unsigned.
I have seen a number of suggestions for how to deal with this, but none with a preponderance of support and none with any convincing logic nor any references to support one approach as clearly "better." The most common suggestions seem to be:
ignore the warnings
turn off the warnings
use a loop variable of type size_t
use a loop variable of type size_t with tricks to avoid decrementing past zero
cast size_of(buffer) to an int
some extremely convoluted suggestions that I did not have the patience to follow because they involved unreadable code, generally involving vectors and/or iterators
libraries that I cannot load in the AVR / ARM embedded environments I often use.
free functions returning a valid int or long representing the byte count of T
Don't use loops (gotta love that advice)
Is there a "correct" way to approach this?
-- Begin Edit --
The example I gave is, of course, trivial, and meant only to demonstrate the type mismatch warning that can occur in an indexing situation.
#3 is not necessarily the obviously correct answer because size_t carries special risks in a decrementing loop such as
for (size_t i = myArray.size; i > 0; --i)
(the array may someday have a size of zero).
#4 is a suggestion to deal with decrementing size_t indexes by including appropriate and necessary checks to avoid ever decrementing past zero. Since that makes the code harder to read, there are some cute shortcuts that are not particularly readable, hence my referring to them as "tricks."
#7 is a suggestion to use libraries that are not generalizable in the sense that they may not be available or appropriate in every setting.
#8 is a suggestion to keep the checks readable, but to hide them in a non-member method, sometimes referred to as a "free function."
#9 is a suggestion to use algorithms rather than loops. This was offered many times as a solution to the size_t indexing problem, and there were a lot of upvotes. I include it even though I can't use the stl library in most of my environments and would have to write the code myself.
-- End Edit--
I am hoping for evidence-based guidance or references as to best practices for handling something like this. Is there a "standard text" or a style guide somewhere that addresses the question? A defined approach that has been adopted/endorsed internally by a major tech company? An emulatable solution forthcoming in a new language release? If necessary, I would be satisfied with an unsupported public recommendation from a single widely recognized expert.
None of the options on offer seem very appealing. The warnings drown out other things I want to see. I don't want to miss signed/unsigned comparisons in places where it might matter. Decrementing a loop variable of type size_t with comparison >=0 results in an infinite loop from unsigned integer wraparound, and even if we protect against that with something like for (size_t i = sizeof(buffer); i-->0 ;), there are other issues with incrementing/decrementing/comparing to size_t variables. Testing against size_t - 1 will yield a large positive 'oops' number when size_t is unexpectedly zero (e.g. strlen(myEmptyString)). Casting an unsigned size_t to an integer is a container size problem (not guaranteed a value) and of course size_t could potentially be bigger than an int.
Given that my arrays are of known sizes well below Int_Max, it seems to me that casting size_t to a signed integer is the best of the bunch, but it makes me cringe a little bit. Especially if it has to be static_cast<int>. Easier to take if it's hidden in a function call with some size testing, but still...
Or perhaps there's a way to turn off the warnings, but just for loop comparisons?
I find any of the three following approaches equally good.
Use a variable of type int to store the size and compare the loop variable to it.
byte buffer[100];
int size = sizeof(buffer);
for (int i = 0; i < size; ++i)
{
buffer[i] = 0;
}
Use size_t as the type of the loop variable.
byte buffer[100];
for (size_t i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
Use a pointer.
byte buffer[100];
byte* end = buffer + sizeof(buffer)
for (byte* p = buffer; p < end; ++p)
{
*p = 0;
}
If you are able to use a C++11 compiler, you can also use a range for loop.
byte buffer[100];
for (byte& b : buffer)
{
b = 0;
}
The most appropriate solution will depend entirely on context. In the context of the code fragment in your question the most appropriate action is perhaps to have type-agreement - the third option in your bullet list. This is appropriate in this case because the usage of i throughout the code is only to index the array - in this case the use of int is inappropriate - or at least unnecessary.
On the other hand if i were an arithmetic object involved in some arithmetic expression that was itself signed, the int might be appropriate and a cast would be in order.
I would suggest that as a guideline, a solution that involves the fewest number of necessary type casts (explicit of implicit) is appropriate, or to look at it another way, the maximum possible type agreement. There is not one "authoritative" rule because the purpose and usage of the variables involved is semantically rather then syntactically dependent. In this case also as has been pointed out in other answers, newer language features supporting iteration may avoid this specific issue altogether.
To discuss the advice you say you have been given specifically:
ignore the warnings
Never a good idea - some will be genuine semantic errors or maintenance issues, and by teh time you have several hundred warnings you are ignoring, how will you spot the one warning that is and issue?
turn off the warnings
An even worse idea; the compiler is helping you to improve your code quality and reliability. Why would you disable that?
use a loop variable of type size_t
In this precise example, that is exactly why you should do; exact type agreement should always be the aim.
use a loop variable of type size_t with tricks to avoid decrementing past zero
This advice is irrelevant for the trivial example given. Moreover I presume that by "tricks" the adviser in fact means checks or just correct code. There is no need for "tricks" and the term is entirely ambiguous - who knows what the adviser means? It suggests something unconventional and a bit "dirty", when there is not need for any solution with such attributes.
cast size_of(buffer) to an int
This may be necessary if the usage of i warrants the use of int for correct semantics elsewhere in the code. The example in the question does not, so this would not be an appropriate solution in this case. Essentially if making i a size_t here causes type agreement warnings elsewhere that cannot themselves be resolved by universal type agreement for all operands in an expression, then a cast may be appropriate. The aim should be to achieve zero warnings an minimum type casts.
some extremely convoluted suggestions that I did not have the patience to follow, generally involving vectors and/or iterators
If you are not prepared to elaborate or even consider such advice, you'd have better omitted the "advice" from your question. The use of STL containers in any case is not always appropriate to a large segment of embedded targets in any case, excessive code size increase and non-deterministic heap management are reasons to avoid on many platforms and applications.
libraries that I cannot load in an embedded environment.
Not all embedded environments have equal constraints. The restriction is on your embedded environment, not by any means all embedded environments. However the "loading of libraries" to resolve or avoid type agreement issues seems like a sledgehammer to crack a nut.
free functions returning a valid int or long representing the byte count of T
It is not clear what that means. What id a "free function"? Is that just a non-member function? Such a function would internally necessarily have a type case, so what have you achieved other than hiding a type cast?
Don't use loops (gotta love that advice).
I doubt you needed to include that advice in your list. The problem is not in any case limited to loops; it is not because you are using a loop that you have the warning, it is because you have used < with mismatched types.
My favorite solution is to use C++11 or newer and skip the whole manual size bounding entirely like so:
// assuming byte is defined by something like using byte = std::uint8_t;
void test()
{
byte buffer[100];
for (auto&& b: buffer)
{
b = 0;
}
}
Alternatively, if I can't use the ranged-based for loop (but still can use C++11 or newer), my favorite syntax becomes:
void test()
{
byte buffer[100];
for (auto i = decltype(sizeof(buffer)){0}; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}
Or for iterating backwards:
void test()
{
byte buffer[100];
// relies on the defined modwrap semantics behavior for unsigned integers
for (auto i = sizeof(buffer) - 1; i < sizeof(buffer); --i)
{
buffer[i] = 0;
}
}
The correct generic way is to use a loop iterator of type size_t. Simply because the is the most correct type to use for describing an array size.
There is not much need for "tricks to avoid decrementing past zero", because the size of an object can never be negative.
If you find yourself needing negative numbers to describe a variable size, it is probably because you have some special case where you are iterating across an array backwards. If so, the "trick" to deal with it is this:
for(size_t i=0; i<sizeof(array); i++)
{
size_t index = sizeof(array)-1 - i;
array[index] = something;
}
However, size_t is often an inconvenient type to use in embedded systems, because it may end up as a larger type than what your MCU can handle with one instruction, resulting in needlessly inefficient code. It may then be better to use a fixed width integer such as uint16_t, if you know the maximum size of the array in advance.
Using plain int in an embedded system is almost certainly incorrect practice. Your variables must be of deterministic size and signedness - most variables in an embedded system are unsigned. Signed variables also lead to major problems whenever you need to use bitwise operators.
If you are able to use C++ 11, you could use decltype to obtain the actual type of what sizeof returns, for instance:
void test()
{
byte buffer[100];
// On macOS decltype(sizeof(buffer)) returns unsigned long, this passes
// the compiler without warnings.
for (decltype(sizeof(buffer)) i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = 0;
}
}

Is using unsigned int instead of std::vector<bool> or std::bitset a recommended practice?

I've seen some coding examples that recommend using an unsigned int to represent a bitmap:
unsigned int zero_rows {0};
for (auto i = 0; i < n_rows; ++i) {
zero_rows |= (1 << i);
…
}
Does this provide any benefit over using std::vector<bool>:
std::vector<bool> zero_rows(n_rows, false);
for (auto i = 0; i < n_rows; ++i) {
zero_rows[i] = true;
…
}
Another option I guess could be std::bitset, but I'm not really sure about the pros and cons of each yet. I'd just like to know what is the recommended practice.
The use of some size of unsigned integer to represent a
fixed-size sequence of bits was the only option in C++ before the 1998 Standard, when
std::vector<bool> and std::bitset were introduced. The practice was inherited from
C, in which it is considered a competent programmer's proficiency and it remains so considered in C++.
std::vector<bool> has come to be regarded with regret. See e.g.
vector<bool>: More Problems, Better Solutions and
Effective STL Item 18.
std:::bitset is considered fit for purpose.
The unsigned integer practice inherently represents a hand-rolled simulation of
of a fixed-size bit-sequence by overt artifice, constrained and complicated by
the fact that there are only a few sizes of unsigned integer (even if the chosen
size is made precise). If you require a fixed-size bit-sequence to be subject
to operations that are all supported by the interface
of std::bitset then other things being equal the Standard Library's provision
is to be preferred to hand-rolled code for the all of the reasons to which the
Library owes its existence.
You must be the judge whether, in the context of your application, other things
are equal.

Dealing with size of stl containers

I'm rewriting a general purpose library that was written by me before I've learned STL. It uses C-style arrays all the way. In many places there is a code like this:
unsigned short maxbuffersize; // Maximum possible size of the buffer. Can be set by user.
unsigned short buffersize; // Current size of the buffer.
T *buffer; // The buffer itself.
The first thing I did was to change the code like this:
unsigned short maxbuffersize;
unsigned short buffersize;
std::vector<T> buffer;
And then:
typedef unsigned short BufferSize;
BufferSize maxbuffersize;
BufferSize buffersize;
std::vector<T> buffer;
And then I felt like I was doing a very bad thing and should reconsider my coding style. At first, BufferSize seemed like a very bad name for a type but then all kinds of weird questions started popping up. How do I name the size type? Should I use my own type or inherit from std::vector<T>::size_type? Should I cache the size of container or use size() all the way? Should I allow the user to manually set the maximum size of container and if not, how do I check for overflow?
I know that there can't be one-size-fits-all approach therefore I'd like to hear the policies other coders and framework vendors use. The library I'm working on is cross-platform general purpose and is intended to be released into public domain and be used for decades. Thanks.
I think the default choice ought to be to get rid of both buffersize and maxbuffersize and use buffer.size() and buffer.capacity() throughout.
I would advise against caching the sizes unless you have very specific reasons to do this, backed with hard data from profiler runs. Caching would introduce extra complexity and the potential for the cache to get of sync with the real thing.
Finally, in places where you feel bounds checking is warranted, you could use buffer.at(i). This will throw an exception if i is out of bounds.
In general I would advise using iterators to access your data. When you do this you often don't explicitly call the size of the container at all. This also decouples you from using std::vector all together - and lets you simply change to, for example std::list if you realize later that this better suits your needs.
When you use iterators the need for vector.size() in general greatly decreases.
(when you do need it use buffer.size() and buffer.capacity() as aix says).
For example:
typedef unsigned short BufferSize;
BufferSize maxbuffersize;
BufferSize buffersize;
std::vector<T> buffer;
for(unsigned short i = 0; i< maxbuffersize;++i)
{
//do something with buffer[i];
}
becomes
struct do_something
{
void operator()(const T& t)
{
//do something with buffer[i]
}
};
std::vector<T> buffer(maxbuffersize);
std::for_each(buffer.begin(), buffer.end(), do_something());
which is a little bit cleaner.
Keeping the size is useful for may structures, but it's a bit redundant for arrays/vectors, since the size is guaranteed to be the final index+1. If you are worried about running past the end, an iterator approach such as was mentioned would solve this, as well as most other issues regarding possible sizes for comparisons, etc;
it's pretty standard to define all of your types and their sizes in a header with the API which sets them for different platforms and compilers...look at windows with it's definitions of LONG, ULONG, DWORD, etc. The old "C" convention is to preface them with a unique name or initials such as MYAPI_SIZETYPE. It's wordy but avoids any crossplatform confusion or compiler issues.

C++, best practices, int or size_t? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
When to use std::size_t?
hello.
Assuming usage patterns are the same (i.e. no negative numbers), which is preferable to use for various indexes, int or size_t type?
Is there performance difference in your experience on 64-bit Intel between the two?
Thank you
size_t is the type that should be used for array indexing when you work with a relatively generic arrays. I.e. when you have just an array of abstract chars, ints or something else.
When you are working with a specific array, i.e. an array that contains some elements specific for your application, you should normally already have a "type of choice" to count or to index the entities of that type in your application. That's the type you should use. For example, if some array contains the records for company employees, then you should already have a "type of choice" in your program that you use to designate the "quantity of employees". That's the type you should use for indexing arrays of employee records. It could be unsigned int, it could be employee_count_t or something like that. Using a naked size_t for that purpose is a design error.
Note also, that size_t is a type not immediately intended for array indexing. It is a type intended to represent the size of the largest object in the program. It "works" for arrays by transitivity: arrays are objects, hence size_t is always enough to index an array. However, when you design a program it makes more sense to think in terms of generic containers, instead of thinking in terms of specific arrays. Today it might be an array, tomorrow you might have to switch to a linked list or a tree instead. In general case, the range of size_t is not sufficient to represent the number of elements in an abstract container, which is why size_t in such cases is not a good choice.
It depends on what you are doing. If you are iterating over a vector, then use std::size_t:
for (std::size_t i = 0; i < vec.size(); i++) {
// do something with vec[i]
}
However, beware of coding errors such as:
for (std::size_t i = 99; i >= 0; i--) {
// This is an infinite loop
}
If you are just doing a loop, you might want to use just a plain int because of the situation above. There should be no performance difference between using int and std::size_t. If you need an exact size, then you should use neither int nor size_t, but rather the types defined in stdint.h.
The types aren't different in the sense you're implying, and generally int is 32bits, and size_t is the width of the platform word (32-64 bits). I'd suggest you use size_t when dealing with files, buffers, and anything else that might describe an area of memory or a buffer.
Furthermore you should note that int is signed, while size_t is not.
Finally, int was historically used where size_t should be used now. However int is still useful in it's own right for other purposes.
size_t or ptrdiff_t. int might not be enough to access all the elements of an array.

Where can I look up the definition of size_type for vectors in the C++ STL?

It seems safe to cast the result of my vector's size() function to an unsigned int. How can I tell for sure, though? My documentation isn't clear about how size_type is defined.
Do not assume the type of the container size (or anything else typed inside).
Today?
The best solution for now is to use:
std::vector<T>::size_type
Where T is your type. For example:
std::vector<std::string>::size_type i ;
std::vector<int>::size_type j ;
std::vector<std::vector<double> >::size_type k ;
(Using a typedef could help make this better to read)
The same goes for iterators, and all other types "inside" STL containers.
After C++0x?
When the compiler will be able to find the type of the variable, you'll be able to use the auto keyword. For example:
void doSomething(const std::vector<double> & p_aData)
{
std::vector<double>::size_type i = p_aData.size() ; // Old/Current way
auto j = p_aData.size() ; // New C++0x way, definition
decltype(p_aData.size()) k; // New C++0x way, declaration
}
Edit: Question from JF
What if he needs to pass the size of the container to some existing code that uses, say, an unsigned int? – JF
This is a problem common to the use of the STL: You cannot do it without some work.
The first solution is to design the code to always use the STL type. For example:
typedef std::vector<int>::size_type VIntSize ;
VIntSize getIndexOfSomeItem(const std::vector<int> p_aInt)
{
return /* the found value, or some kind of std::npos */
}
The second is to make the conversion yourself, using either a static_cast, using a function that will assert if the value goes out of bounds of the destination type (sometimes, I see code using "char" because, "you know, the index will never go beyond 256" [I quote from memory]).
I believe this could be a full question in itself.
According to the standard, you cannot be sure. The exact type depends on your machine. You can look at the definition in your compiler's header implementations, though.
I can't imagine that it wouldn't be safe on a 32-bit system, but 64-bit could be a problem (since ints remain 32 bit). To be safe, why not just declare your variable to be vector<MyType>::size_type instead of unsigned int?
It should always be safe to cast it to size_t. unsigned int isn't enough on most 64-bit systems, and even unsigned long isn't enough on Windows (which uses the LLP64 model instead of the LP64 model most Unix-like systems use).
The C++ standard only states that size_t is found in <cstddef>, which puts the identifiers in <stddef.h>. My copy of Harbison & Steele places the minimum and maximum values for size_t in <stdint.h>. That should give you a notion of how big your recipient variable needs to be for your platform.
Your best bet is to stick with integer types that are large enough to hold a pointer on your platform. In C99, that'd be intptr_t and uintptr_t, also officially located in <stdint.h>.
As long as you're sure that an unsigned int on your system will be large enough to hold the number of items you'll have in the vector you should be safe ;-)
I'm not sure how well this will work because I'm just thinking off the top of my head, but a compile-time assertion (such as BOOST_STATIC_ASSERT() or see Ways to ASSERT expressions at build time in C) might help. Something like:
BOOST_STATIC_ASSERT( sizeof( unsigned int) >= sizeof( size_type));