Why do I need to use `size_t` in C++? - c++

As a beginner, I'm really confused about size_t. I can use int, float or other types. Why still declare size_t type. I don't feel its advantages.
I've viewed some pages, but I still can't understand it.

Its main advantage is that it's the right tool for the job.
size_t is literally defined to be big enough to represent the size of any object on your platform. The others are not. So, when you want to store the size of an object, why would you use anything else?
You can use int if you like, but you'll be deliberately choosing the inferior option that leads to bugs. I don't quite understand why you'd want to do so, but hey it's your code.
If you choose to use float, though, please tell us what program you're writing so we can avoid it. :)

Using a float would be horrible since that would be a misuse of floating point types, plus type promotion would mean that multiplying the size of anything would take place in floating point!
Using a int would also be horrible since the specifics of an int are intentionally loosely defined by the C++ standard. (It could be as small as 16 bits).
But a size_t type is guaranteed to adequately represent the size of pretty much anything and certainly the sizes of containers in the C++ standard library. Its specific details are dependent on a particular platform and architecture. The fact that it's an unsigned type is the subject of much debate. (I personally believe it was a mistake to make it unsigned as it can mess up code using relational operators and introduce pernicious bugs that are difficult to spot).

I would advise you to use size_t whenever you want to store sizes of classes or structures or when you deal with raw memory(e.g. storing size of raw memory or using as an index of a raw array). However for indexing/iterating over standard containers (such as std::vector), I recommend using underlying size type of a given container(e.g. vector::size_type).

Related

std::bitset<N> implementation causes size overheard

Seems like std::bitset<N> under the hood is an array of unsigned longs now, this means that there will be an (heavy?) overhead when N is small. sizeof(std::bitset<8>) is 8 bytes!
Is there a reason why the type of underlying array itself is not a template parameter? Why does the implementation not use uint32_t/16_t/8_t when more appropriate? I do not see anything in the implementation that limits this?
I am guessing I am missing a particular reason but unsure how to look for it or maybe there is no reason at all ? Since this is such a simple container I am not able to understand how the zero overhead principle of C++ seems to be avoided here.
GCC Impl: https://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a00775_source.html
I believe clang is similar (used sizeof to confirm)
I am not able to understand how the zero overhead principle of C++ seems to be avoided here.
The zero-overhead principle is a principle, not an absolute rule of C++.
Many people use std::vector in contexts where a compile-time fixed capacity would be useful. Such a type could have only two pointers instead of three and thus be 50% smaller. Many people use std::string in contexts where an immutable string would work just as well if not better; it would reduce the size of the string (ignoring SSO), as well as its complexity. And so forth.
These all represent inefficiencies relative to the standard type. No standard library type can handle every possible usage scenario. The goal for such types is to be broadly useful, not perfect.
There is nothing preventing someone from writing a bitset-style type with the exact same interface which has a user-provided underlying type. But the standard has no such type.
Indeed, there's nothing preventing implementations of bitset from choosing an underlying type based on the given number of bits. Your implementation doesn't do that, but it could have.

Is it a good idea to base a non-owning bit container on std::vector<bool>? std::span?

In a couple of projects of mine I have had an increasing need to deal with contiguous sequences of bits in memory - efficiently (*). So far I've written a bunch of inline-able standalone functions, templated on the choice of a "bit container" type (e.g. uint32_t), for getting and setting bits, applying 'or' and 'and' to their values, locating the container, converting lengths in bits to sizes in bytes or lengths in containers, etc. ... it looks like it's class-writing time.
I know the C++ standard library has a specialization of std::vector<bool>, which is considered by many to be a design flaw - as its iterators do not expose actual bools, but rather proxy objects. Whether that's a good idea or a bad one for a specialization, it's definitely something I'm considering - an explicit bit proxy class, which will hopefully "always" be optimized away (with a nice greasing-up with constexpr, noexcept and inline). So, I was thinking of possibly adapting std::vector code from one of the standard library implementation.
On the other hand, my intended class:
Will never own the data / the bits - it'll receive a starting bit container address (assuming alignment) and a length in bits, and won't allocate or free.
It will not be able resize the data dynamically or otherwise - not even while retaining the same amount of space like std::vector::resize(); its length will be fixed during its lifespan/scope.
It shouldn't anything know about the heap (and work when there is no heap)
In this sense, it's more like a span class for bits. So maybe start out with a span then? I don't know, spans are still not standard; and there are no proxies in spans...
So what would be a good basis (edit: NOT a base class) for my implementation? std::vector<bool>? std::span? Both? None? Or - maybe I'm reinventing the wheel and this is already a solved problem?
Notes:
The bit sequence length is known at run time, not compile time; otherwise, as #SomeProgrammerDude suggests I could use std::bitset.
My class doesn't need to "be-a" span or "be-a" vector, so I'm not thinking of specializing any of them.
(*) - So far not SIMD-efficiently but that may come later. Also, this may be used in CUDA code where we don't SIMDize but pretend the lanes are proper threads.
Rather than std::vector or std::span I suspect an implementation of your class would share more in common with std::bitset, since it is pretty much the same thing, except with a (fixed) runtime-determined size.
In fact, you could probably take a typical std::bitset implementation and move the <size_t N> template parameter into the class as a size_t size_ member (or whatever name you like) and you'll have your dynamic bitset class with almost no changes. You may want to get rid anything of you consider cruft like the constructors that take std::string and friends.
The last step is then to remove ownership of the underlying data: basically you'll remove the creation of the underlying array in the constructor and maintain a view of an existing array with some pointers.
If your clients disagree on what the underlying unsigned integer type to use for storage (what you call the "bit container"), then you may also need to make your class a template on this type, although it would be simpler if everyone agreed on say uint64_t.
As far as std::vector<bool> goes, you don't need much from that: everything that vector does that you want, std::bitset probably does too: the main thing that vector adds is dynamic growth - but you've said you don't want that. vector<bool> has the proxy object concept to represent a single bit, but so does std::bitset.
From std::span you take the idea of non-ownership of the underlying data, but I don't think this actually represents a lot of underlying code. You might want to consider the std::span approach of having either a compile-time known size or a runtime provided size (indicated by Extent == std::dynamic_extent) if that would be useful for you (mostly if you sometimes use compile-time sizes and could specialize some methods to be more efficient in that case).

Is there a reason not to use fixed width types?

I'm new to C++.
I was learning about types, their memory uses and the differences in their memory size based on architecture. Is there any downside to using fixed-width types such as int32_t?
The only real downside might be if you want your code to be portable to a system that doesn't have a 32-bit integer type. In practice those are pretty rare, but they are out there.
C++ has access to the C99 (and newer) integer types via cstdint, which will give you access to the int_leastN_t and int_fastN_t types which might be the most portable way to get specific bit-widths into your code, should you really happen to care about that.
The original intent of the int type was for it to represent the natural size of the architecture you were running on; you could assume that any operations on it were the fastest possible for an integer type.
These days the picture is more complicated. Cache effects or vector instruction optimization might favor using an integer type that is smaller than the natural size.
Obviously if your algorithm requires an int of at least a certain size, you're better off being explicit about it.
E.g.
To save space, use int_least32_t
To save time, use int_fast32_t
But in actuality, I personally use long (at least 32-bit) and int (at least 16-bit) from time to time simply because they are easier to type.
(Besides, int32_t is optional, not guaranteed to exist.)

C++ Variable Width Bit Field

I'm writing a program that is supposed to manipulate very long strings of boolean values. I was originally storing them as a dynamic array of unsigned long long int variables and running C-style bitwise operations on them.
However, I don't want the overhead that comes with having to iterate over an array even if the processor is doing it at the machine code level - i.e. it is my belief that the compiler is probably more efficient than I am.
So, I'm wondering if there's a way to store them as a bitfield. The only problem with that is that I heard you needed to declare a constant at runtime for that to work and I don't particularly care to do that as I don't know how many bits I need when the program starts. Is there a way to do this?
As per the comments, std::bitset or std::vector<bool> are probably what you need. bitset is fixed-length, vector<bool> is dynamic.
vector<bool> is a specialization of vector that only uses one bit per value, rather than sizeof(bool), like you might expect... While good for memory use, this exception is actually disliked by the standards body these days, because (among other things) vector<bool> doesn't fulfil the same contract that vector<T> does - it returns proxy objects instead of references, which wreaks havoc in generic code.

Correct storage of container size on 32, 64 bit

I am currently converting an application to 64 bit.
I have some occurrences of the following pattern:
class SomeOtherClass;
class SomeClass
{
std::vector<SomeOtherClass*> mListOfThings;
void SomeMemberFunction (void)
{
// Needs to know the size of the member list variable
unsigned int listSize = mListOfThings.size();
// use listSize in further computations
//...
}
}
Obviously in a practical case I will not have more then MAX_INT items in my list. But I wondered if there is consensus about the 'best' way to represent this type.
Each collection defines its own return type for size(), so my first approximation would be:
std::vector<SomeOtherClass*>::size_type listSize = mListOfThings.size()
I would assume this to be correct, but (personally) I dont find this 'easy reading', so -1 for clarity.
For a c++011 aware compiler I could write
auto listSize = mListOfThings.size()
which is clearly more readable.
So my question, is the latter indeed the best way to handle storing container sizes in a variable and using them in computations, regardless of underlying architecture (win32, win64, linux, macosx) ?
What exactly you want to use is a matter of how "purist" you want your code to be.
If you're on C++11, you can just use auto and be done with.
Otherwise, in extremely generic code (which is designed to work with arbitrary allocators), you can use the container's nested typedef size_type. That is taken verbatim from the container's allocator.
In normal use of standard library containers, you can use std::size_t. That is the size_type used by the default allocators, and is the type guaranteed to be able to store any object size.
I wouldn't recommend using [unsigned] int, as that will likely be smaller than necessary on 64-bit platforms (it's usually left at 32 bits, although this of course depends on compiler and settings). I've actually seen production code fail due to unsigned int not being enough to index a container.
It depends on why you need the size, and what is going to be
in the vector. Internally, vector uses std::size_t. But
that's an unsigned type, inappropriate for numerical values. If
you just want to display the value, or something, fine, but if
you're using it in any way as a numerical value, the
unsignedness will end up biting you.
Realistically, there are a lot of times the semantics of the
code ensure that the number of values cannot be more than
INT_MAX. For example, when evaluating financial instruments,
the maximum number of elements is less than 20000, so there's no
need to worry about overflowing an int. In other cases,
you'll validate your input first, to ensure that there will
never be overflow. If you can't do this, the best solution is
probably ptrdiff_t (which is the type you get from subtracting
to iterators). Or if you're using non-standard allocators,
MyVectorType::difference_type.
Not sure if you've already considered this, but what is wrong with size_t?
It is what you compiler uses for sizes of builtin containers (i.e. arrays).