Comparison between signed and unsigned. Is static_cast the only solution? - c++

I use third party containers that use int to store the size. I also use stl containers which use size_t to store size.
I very often in my code have to use both in the same loop, like for example:
// vec is std::vector
// list is the third party container
assert(vec.size() == list.size()); // warning
for(size_t i = 0; i < vec.size(); i++)
{
vec[i] = list[i]; // warning
}
So to fix I have to either I do function style casting, which I was told is C style casting in disguise.
// vec is std::vector
// list is the third party container
assert(int(vec.size()) == list.size());
for(size_t i = 0; i < vec.size(); i++)
{
vec[i] = list[int(i)];
}
Or I can do the even uglier solution that everyone recommends. The static casting.
// vec is std::vector
// list is the third party container
assert(static_cast<int>(vec.size()) == list.size());
for(size_t i = 0; i < vec.size(); i++)
{
vec[i] = list[static_cast<int>(i)];
}
I really don't want to static_cast.
Can the implicit conversion in this particular scenario be dangerous?
Would the function style be okay in my case?
If static_cast is really the only safe solution. Should I cast the int to size_t or size_t to int?
Thank you.

Binary operators (operator== here) convert signed integers to unsigned if one of the arguments is unsigned. A signed negative integer turns into a big positive one.
Container element count must not be negative to avoid breaking the principle of least surprise. So, the implicit conversion must be safe. But be sure to check the implementation/documentation of int size() const.
You may like to preserve the semantics of the implicit conversion and use as_unsigned function to avoid the cast and better convey the intent:
#include <type_traits>
template<class T>
inline typename std::make_unsigned<T>::type as_unsigned(T a) {
return static_cast<typename std::make_unsigned<T>::type>(a);
}
and then:
assert(vec.size() == as_unsigned(list.size()));

Related

Extend std::vector with range checking and signed size_type

I would like to use a class with the same functionality as std::vector, but
Replace std::vector<T>::size_type by some signed integer (like int64_t or simply int), instead of usual size_t. It is very annoying to see warnings produced by a compiler in comparisons between signed and unsigned numbers when I use standard vector interface. I can't just disable such warnings, because they really help to catch programming errors.
put assert(0 <= i && i < size()); inside operator[](int i) to check out of range errors. As I understand it will be a better option over the call to .at() because I can disable assertions in release builds, so performance will be the same as in the standard implementation of the vector. It is almost impossible for me to use std::vector without manual checking of range before each operation because operator[] is the source of almost all weird errors related to memory access.
The possible options that come to my mind are to
Inherit from std::vector. It is not a good idea, as said in the following question: Extending the C++ Standard Library by inheritance?.
Use composition (put std::vector inside my class) and repeat all the interface of std::vector. This option forces me to think about the current C++ standard, because the interface of some methods, iterators is slightly different in C++ 98,11,14,17. I would like to be sure, that when c++ 20 became available, I can simply use it without reimplementation of all the interface of my vector.
An answer more to the underlying problem read from the comment:
For example, I don't know how to write in a ranged-based for way:
for (int i = a.size() - 2; i >= 0; i--) { a[i] = 2 * a[i+1]; }
You may change it to a generic one like this:
std::vector<int> vec1{ 1,2,3,4,5,6};
std::vector<int> vec2 = vec1;
int main()
{
// generic code
for ( auto it = vec1.rbegin()+1; it != vec1.rend(); it++ )
{
*it= 2* *(it-1);
}
// your code
for (int i = vec2.size() - 2; i >= 0; i--)
{
vec2[i] = 2 * vec2[i+1];
}
for ( auto& el: vec1) { std::cout << el << std::endl; }
for ( auto& el: vec2) { std::cout << el << std::endl; }
}
Not using range based for as it is not able to access relative to the position.
Regarding point 1: we hardly ever get those warnings here, because we use vectors' size_type where appropriate and/or cast to it if needed (with a 'checked' cast like boost::numeric_cast for safety). Is that not an option for you? Otherwise, write a function to do it for you, i.e. the non-const version would be something like
template<class T>
T& ati(std::vector<T>& v, std::int64_t i)
{
return v.at(checked_cast<decltype(v)::size_type>(i));
}
And yes, inheriting is still a problem. And even if it weren't you'd break the definition of vector (and the Liskov substitution principle I guess), because the size_type is defined as
an unsigned integral type that can represent any non-negative value of difference_type
So it's down to composition, or a bunch of free functions for accessing with a signed size_type and a range check. Personally I'd go for the latter: less work, as easy to use, and you can still pass your vector to functions teaking vectors without problems.
(This is more a comment than a real answer, but has some code, so...)
For the second part (range checking at runtime), a third option would be to use some macro trick:
#ifdef DEBUG
#define VECTOR_AT(v,i) v.at(i)
#else
#define VECTOR_AT(v,i) v[i]
#endif
This can be used this way:
std::vector<sometype> vect(somesize);
VECTOR_AT(vect,i) = somevalue;
Of course, this requires editing your code in a quite non-standard way, which may not be an option. But it does the job.

std::array vs array performance

If I want to build a very simple array like:
int myArray[3] = {1,2,3};
Should I use std::array instead?
std::array<int, 3> a = {{1, 2, 3}};
What are the advantages of using std::array over usual ones? Is it more performant? Just easier to handle for copy/access?
What are the advantages of using std::array over usual ones?
It has friendly value semantics, so that it can be passed to or returned from functions by value. Its interface makes it more convenient to find the size, and use with STL-style iterator-based algorithms.
Is it more performant ?
It should be exactly the same. By definition, it's a simple aggregate containing an array as its only member.
Just easier to handle for copy/access ?
Yes.
A std::array is a very thin wrapper around a C-style array, basically defined as
template<typename T, size_t N>
struct array
{
T _data[N];
T& operator[](size_t);
const T& operator[](size_t) const;
// other member functions and typedefs
};
It is an aggregate, and it allows you to use it almost like a fundamental type (i.e. you can pass-by-value, assign etc, whereas a standard C array cannot be assigned or copied directly to another array). You should take a look at some standard implementation (jump to definition from your favourite IDE or directly open <array>), it is a piece of the C++ standard library that is quite easy to read and understand.
std::array is designed as zero-overhead wrapper for C arrays that gives it the "normal" value like semantics of the other C++ containers.
You should not notice any difference in runtime performance while you still get to enjoy the extra features.
Using std::array instead of int[] style arrays is a good idea if you have C++11 or boost at hand.
Is it more performant ?
It should be exactly the same. By definition, it's a simple aggregate containing an array as its only member.
The situation seems to be more complicated, as std::array does not always produce identical assembly code compared to C-array depending on the specific platform.
I tested this specific situation on godbolt:
#include <array>
void test(double* const C, const double* const A,
const double* const B, const size_t size) {
for (size_t i = 0; i < size; i++) {
//double arr[2] = {0.e0};//
std::array<double, 2> arr = {0.e0};//different to double arr[2] for some compiler
for (size_t j = 0; j < size; j++) {
arr[0] += A[i] * B[j];
arr[1] += A[j] * B[i];
}
C[i] += arr[0];
C[i] += arr[1];
}
}
GCC and Clang produce identical assembly code for both the C-array version and the std::array version.
MSVC and ICPC, however, produce different assembly code for each array version. (I tested ICPC19 with -Ofast and -Os; MSVC -Ox and -Os)
I have no idea, why this is the case (I would indeed expect exactly identical behavior of std::array and c-array). Maybe there are different optimization strategies employed.
As a little extra:
There seems to be a bug in ICPC with
#pragma simd
for vectorization when using the c-array in some situations
(the c-array code produces a wrong output; the std::array version works fine).
Unfortunately, I do not have a minimal working example for that yet, since I discovered that problem while optimizing a quite complicated piece of code.
I will file a bug-report to intel when I am sure that I did not just misunderstood something about C-array/std::array and #pragma simd.
std::array has value semantics while raw arrays do not. This means you can copy std::array and treat it like a primitive value. You can receive them by value or reference as function arguments and you can return them by value.
If you never copy a std::array, then there is no performance difference than a raw array. If you do need to make copies then std::array will do the right thing and should still give equal performance.
You will get the same perfomance results using std::array and c array
If you run these code:
std::array<QPair<int, int>, 9> *m_array=new std::array<QPair<int, int>, 9>();
QPair<int, int> *carr=new QPair<int, int>[10];
QElapsedTimer timer;
timer.start();
for (int j=0; j<1000000000; j++)
{
for (int i=0; i<9; i++)
{
m_array->operator[](i).first=i+j;
m_array->operator[](i).second=j-i;
}
}
qDebug() << "std::array<QPair<int, int>" << timer.elapsed() << "milliseconds";
timer.start();
for (int j=0; j<1000000000; j++)
{
for (int i=0; i<9; i++)
{
carr[i].first=i+j;
carr[i].second=j-i;
}
}
qDebug() << "QPair<int, int> took" << timer.elapsed() << "milliseconds";
return 0;
You will get these results:
std::array<QPair<int, int> 5670 milliseconds
QPair<int, int> took 5638 milliseconds
Mike Seymour is right, if you can use std::array you should use it.

C++ self-enforcing a standard: size_t

Simple question,
Would it be good for me to force myself to start using size_t (or unsigned longs?) in places where I would normally use ints when dealing with arrays or other large datastructures?
Say you have a vector pointer:
auto myVectorPtr = myVector;
Unknown to you, the size of this vector is larger than:
std::numeric_limits<int>::max();
and you have a loop:
for(int i = 0; i < myVectorPtr->size(); ++i)
wouldn't it be preferable to use
for(size_t i = 0; i < myVectorPtr->size(); ++i)
to avoid running into overflows?
I guess my question really is, are there any side effects of using size_t (or unsigned longs?) in arithmetic and other common operations. Is there anything I need to watch out for if I started using size_t (or unsigned longs?) instead of the classic int.
size_t is certainly better than int. The safest thing to do would be to use the actual size_type of the container, e.g.:
for( typename decltype(*myVectorPtr)::size_type i = 0; i < myVectorPtr->size(); ++i )
Unfortunately auto cannot be used here because it would deduce its type from 0, not from the size() call.
It reads a bit nicer to use iterator or range-based interfaces:
for (auto iter = begin(*myVectorPtr); iter != end(*myVectorPtr); ++iter)
or
for (auto &&item : *myVectorPtr)

Why does sizeof operator fail to work inside function template?

I am trying to learn C++ function templates.I am passing an array as pointer to my function template. In that, I am trying to find the size of an array. Here is the function template that I use.
template<typename T>
T* average( T *arr)
{
T *ansPtr,ans,sum = 0.0;
size_t sz = sizeof(arr)/sizeof(arr[0]);
cout<<"\nSz is "<<sz<<endl;
for(int i = 0;i < sz; i++)
{
sum = sum + arr[i];
}
ans = (sum/sz);
ansPtr = &ans;
return ansPtr;
}
The cout statement displays the size of arr as 1 even when I am passing the pointer to an array of 5 integers. Now I know this might be a possible duplicate of questions to which I referred earlier but I need a better explanation on this.
Only thing I could come up with is that since templates are invoked at runtime,and sizeof is a compile time operator, compiler just ignores the line
int sz = sizeof(arr)/sizeof(arr[0]);
since it does not know the exact type of arr until it actually invokes the function.
Is it correct or am I missing something over here? Also is it reliable to send pointer to an array to the function templates?
T *arr
This is C++ for "arr is a pointer to T". sizeof(arr) obviously means "size of the pointer arr", not "size of the array arr", for obvious reasons. That's the crucial flaw in that plan.
To get the size of an array, the function needs to operate on arrays, obviously not on pointers. As everyone knows (right?) arrays are not pointers.
Furthermore, an average function should return an average value. But T* is a "pointer to T". An average function should not return a pointer to a value. That is not a value.
Having a pointer return type is not the last offense: returning a pointer to a local variable is the worst of all. Why would you want to steal hotel room keys?
template<typename T, std::size_t sz>
T average( T(&arr)[sz])
{
T ans,sum = 0.0;
cout<<"\nSz is "<<sz<<endl;
for(int i = 0;i < sz; i++)
{
sum = sum + arr[i];
}
ans = (sum/sz);
return ans;
}
If you want to be able to access the size of a passed parameter, you'd have to make that a template parameter, too:
template<typename T, size_t Len>
T average(const T (&arr)[Len])
{
T sum = T();
cout<<"\nSz is "<<Len<<endl;
for(int i = 0;i < Len; i++)
{
sum = sum + arr[i];
}
return (sum/Len);
}
You can then omit the sizeof, obviously. And you cannot accidentially pas a dynamically allocated array, which is a good thing. On the downside, the template will get instantiated not only once for every type, but once for every size. If you want to avoid duplicating the bulk of the code, you could use a second templated function which accepts pointer and length and returns the average. That could get called from an inline function.
template<typename T>
T average(const T* arr, size_t len)
{
T sum = T();
cout<<"\nSz is "<<len<<endl;
for(int i = 0;i < len; i++)
{
sum = sum + arr[i];
}
return (sum/len);
}
template<typename T, size_t Len>
inline T average(const T (&arr)[Len])
{
return average(arr, Len);
}
Also note that returning the address of a variable which is local to the function is a very bad idea, as it will not outlive the function. So better to return a value and let the compiler take care of optimizing away unneccessary copying.
Arrays decay to pointers when passed as a parameter, so you're effectively getting the size of the pointer. It has nothing to do with templates, it's how the language is designed.
Others have pointed out the immediate errors, but IMHO, there are two
important points that they haven't addresses. Both of which I would
consider errors if they occurred in production code:
First, why aren't you using std::vector? For historical reasons, C
style arrays are broken, and generally should be avoided. There are
exceptions, but they mostly involve static initialization of static
variables. You should never pass C style arrays as a function
argument, because they create the sort of problems you have encountered.
(It's possible to write functions which can deal with both C style
arrays and std::vector efficiently. The function should be a
function template, however, which takes two iterators of the template
type.)
The second is why aren't you using the functions in the standard
library? Your function can be written in basically one line:
template <typename ForwardIterator>
typename ForwardIterator::value_type
average( ForwardIterator begin, ForwardIterator end )
{
return std::accumulate( begin, end,
typename::ForwardIterator::value_type() )
/ std::distance( begin, end );
}
(This function, of course, isn't reliable for floating point types,
where rounding errors can make the results worthless. Floating point
raises a whole set of additional issues. And it probably isn't really
reliable for the integral types either, because of the risk of overflow.
But these are more advanced issues.)

How do I deal with "signed/unsigned mismatch" warnings (C4018)?

I work with a lot of calculation code written in c++ with high-performance and low memory overhead in mind. It uses STL containers (mostly std::vector) a lot, and iterates over that containers almost in every single function.
The iterating code looks like this:
for (int i = 0; i < things.size(); ++i)
{
// ...
}
But it produces the signed/unsigned mismatch warning (C4018 in Visual Studio).
Replacing int with some unsigned type is a problem because we frequently use OpenMP pragmas, and it requires the counter to be int.
I'm about to suppress the (hundreds of) warnings, but I'm afraid I've missed some elegant solution to the problem.
On iterators. I think iterators are great when applied in appropriate places. The code I'm working with will never change random-access containers into std::list or something (so iterating with int i is already container agnostic), and will always need the current index. And all the additional code you need to type (iterator itself and the index) just complicates matters and obfuscates the simplicity of the underlying code.
It's all in your things.size() type. It isn't int, but size_t (it exists in C++, not in C) which equals to some "usual" unsigned type, i.e. unsigned int for x86_32.
Operator "less" (<) cannot be applied to two operands of different sign. There's just no such opcodes, and standard doesn't specify, whether compiler can make implicit sign conversion. So it just treats signed number as unsigned and emits that warning.
It would be correct to write it like
for (size_t i = 0; i < things.size(); ++i) { /**/ }
or even faster
for (size_t i = 0, ilen = things.size(); i < ilen; ++i) { /**/ }
Ideally, I would use a construct like this instead:
for (std::vector<your_type>::const_iterator i = things.begin(); i != things.end(); ++i)
{
// if you ever need the distance, you may call std::distance
// it won't cause any overhead because the compiler will likely optimize the call
size_t distance = std::distance(things.begin(), i);
}
This a has the neat advantage that your code suddenly becomes container agnostic.
And regarding your problem, if some library you use requires you to use int where an unsigned int would better fit, their API is messy. Anyway, if you are sure that those int are always positive, you may just do:
int int_distance = static_cast<int>(distance);
Which will specify clearly your intent to the compiler: it won't bug you with warnings anymore.
If you can't/won't use iterators and if you can't/won't use std::size_t for the loop index, make a .size() to int conversion function that documents the assumption and does the conversion explicitly to silence the compiler warning.
#include <cassert>
#include <cstddef>
#include <limits>
// When using int loop indexes, use size_as_int(container) instead of
// container.size() in order to document the inherent assumption that the size
// of the container can be represented by an int.
template <typename ContainerType>
/* constexpr */ int size_as_int(const ContainerType &c) {
const auto size = c.size(); // if no auto, use `typename ContainerType::size_type`
assert(size <= static_cast<std::size_t>(std::numeric_limits<int>::max()));
return static_cast<int>(size);
}
Then you write your loops like this:
for (int i = 0; i < size_as_int(things); ++i) { ... }
The instantiation of this function template will almost certainly be inlined. In debug builds, the assumption will be checked. In release builds, it won't be and the code will be as fast as if you called size() directly. Neither version will produce a compiler warning, and it's only a slight modification to the idiomatic loop.
If you want to catch assumption failures in the release version as well, you can replace the assertion with an if statement that throws something like std::out_of_range("container size exceeds range of int").
Note that this solves both the signed/unsigned comparison as well as the potential sizeof(int) != sizeof(Container::size_type) problem. You can leave all your warnings enabled and use them to catch real bugs in other parts of your code.
You can use:
size_t type, to remove warning messages
iterators + distance (like are first hint)
only iterators
function object
For example:
// simple class who output his value
class ConsoleOutput
{
public:
ConsoleOutput(int value):m_value(value) { }
int Value() const { return m_value; }
private:
int m_value;
};
// functional object
class Predicat
{
public:
void operator()(ConsoleOutput const& item)
{
std::cout << item.Value() << std::endl;
}
};
void main()
{
// fill list
std::vector<ConsoleOutput> list;
list.push_back(ConsoleOutput(1));
list.push_back(ConsoleOutput(8));
// 1) using size_t
for (size_t i = 0; i < list.size(); ++i)
{
std::cout << list.at(i).Value() << std::endl;
}
// 2) iterators + distance, for std::distance only non const iterators
std::vector<ConsoleOutput>::iterator itDistance = list.begin(), endDistance = list.end();
for ( ; itDistance != endDistance; ++itDistance)
{
// int or size_t
int const position = static_cast<int>(std::distance(list.begin(), itDistance));
std::cout << list.at(position).Value() << std::endl;
}
// 3) iterators
std::vector<ConsoleOutput>::const_iterator it = list.begin(), end = list.end();
for ( ; it != end; ++it)
{
std::cout << (*it).Value() << std::endl;
}
// 4) functional objects
std::for_each(list.begin(), list.end(), Predicat());
}
C++20 has now std::cmp_less
In c++20, we have the standard constexpr functions
std::cmp_equal
std::cmp_not_equal
std::cmp_less
std::cmp_greater
std::cmp_less_equal
std::cmp_greater_equal
added in the <utility> header, exactly for this kind of scenarios.
Compare the values of two integers t and u. Unlike builtin comparison operators, negative signed integers always compare less than (and not equal to) unsigned integers: the comparison is safe against lossy integer conversion.
That means, if (due to some wired reasons) one must use the i as integer, the loops, and needs to compare with the unsigned integer, that can be done:
#include <utility> // std::cmp_less
for (int i = 0; std::cmp_less(i, things.size()); ++i)
{
// ...
}
This also covers the case, if we mistakenly static_cast the -1 (i.e. int)to unsigned int. That means, the following will not give you an error:
static_assert(1u < -1);
But the usage of std::cmp_less will
static_assert(std::cmp_less(1u, -1)); // error
I can also propose following solution for C++11.
for (auto p = 0U; p < sys.size(); p++) {
}
(C++ is not smart enough for auto p = 0, so I have to put p = 0U....)
I will give you a better idea
for(decltype(things.size()) i = 0; i < things.size(); i++){
//...
}
decltype is
Inspects the declared type of an entity or the type and value category
of an expression.
So, It deduces type of things.size() and i will be a type as same as things.size(). So,
i < things.size() will be executed without any warning
I had a similar problem. Using size_t was not working. I tried the other one which worked for me. (as below)
for(int i = things.size()-1;i>=0;i--)
{
//...
}
I would just do
int pnSize = primeNumber.size();
for (int i = 0; i < pnSize; i++)
cout << primeNumber[i] << ' ';