C++ self-enforcing a standard: size_t

C++ self-enforcing a standard: size_t - c++

Simple question,
Would it be good for me to force myself to start using size_t (or unsigned longs?) in places where I would normally use ints when dealing with arrays or other large datastructures?
Say you have a vector pointer:
auto myVectorPtr = myVector;
Unknown to you, the size of this vector is larger than:
std::numeric_limits<int>::max();
and you have a loop:
for(int i = 0; i < myVectorPtr->size(); ++i)
wouldn't it be preferable to use
for(size_t i = 0; i < myVectorPtr->size(); ++i)
to avoid running into overflows?
I guess my question really is, are there any side effects of using size_t (or unsigned longs?) in arithmetic and other common operations. Is there anything I need to watch out for if I started using size_t (or unsigned longs?) instead of the classic int.

size_t is certainly better than int. The safest thing to do would be to use the actual size_type of the container, e.g.:
for( typename decltype(*myVectorPtr)::size_type i = 0; i < myVectorPtr->size(); ++i )
Unfortunately auto cannot be used here because it would deduce its type from 0, not from the size() call.
It reads a bit nicer to use iterator or range-based interfaces:
for (auto iter = begin(*myVectorPtr); iter != end(*myVectorPtr); ++iter)
or
for (auto &&item : *myVectorPtr)

Related

Comparison between signed and unsigned. Is static_cast the only solution?

I use third party containers that use int to store the size. I also use stl containers which use size_t to store size.
I very often in my code have to use both in the same loop, like for example:
// vec is std::vector
// list is the third party container
assert(vec.size() == list.size()); // warning
for(size_t i = 0; i < vec.size(); i++)
{
vec[i] = list[i]; // warning
}
So to fix I have to either I do function style casting, which I was told is C style casting in disguise.
// vec is std::vector
// list is the third party container
assert(int(vec.size()) == list.size());
for(size_t i = 0; i < vec.size(); i++)
{
vec[i] = list[int(i)];
}
Or I can do the even uglier solution that everyone recommends. The static casting.
// vec is std::vector
// list is the third party container
assert(static_cast<int>(vec.size()) == list.size());
for(size_t i = 0; i < vec.size(); i++)
{
vec[i] = list[static_cast<int>(i)];
}
I really don't want to static_cast.
Can the implicit conversion in this particular scenario be dangerous?
Would the function style be okay in my case?
If static_cast is really the only safe solution. Should I cast the int to size_t or size_t to int?
Thank you.

Binary operators (operator== here) convert signed integers to unsigned if one of the arguments is unsigned. A signed negative integer turns into a big positive one.
Container element count must not be negative to avoid breaking the principle of least surprise. So, the implicit conversion must be safe. But be sure to check the implementation/documentation of int size() const.
You may like to preserve the semantics of the implicit conversion and use as_unsigned function to avoid the cast and better convey the intent:
#include <type_traits>
template<class T>
inline typename std::make_unsigned<T>::type as_unsigned(T a) {
return static_cast<typename std::make_unsigned<T>::type>(a);
}
and then:
assert(vec.size() == as_unsigned(list.size()));

std::array vs array performance

If I want to build a very simple array like:
int myArray[3] = {1,2,3};
Should I use std::array instead?
std::array<int, 3> a = {{1, 2, 3}};
What are the advantages of using std::array over usual ones? Is it more performant? Just easier to handle for copy/access?

What are the advantages of using std::array over usual ones?
It has friendly value semantics, so that it can be passed to or returned from functions by value. Its interface makes it more convenient to find the size, and use with STL-style iterator-based algorithms.
Is it more performant ?
It should be exactly the same. By definition, it's a simple aggregate containing an array as its only member.
Just easier to handle for copy/access ?
Yes.

A std::array is a very thin wrapper around a C-style array, basically defined as
template<typename T, size_t N>
struct array
{
T _data[N];
T& operator[](size_t);
const T& operator[](size_t) const;
// other member functions and typedefs
};
It is an aggregate, and it allows you to use it almost like a fundamental type (i.e. you can pass-by-value, assign etc, whereas a standard C array cannot be assigned or copied directly to another array). You should take a look at some standard implementation (jump to definition from your favourite IDE or directly open <array>), it is a piece of the C++ standard library that is quite easy to read and understand.

std::array is designed as zero-overhead wrapper for C arrays that gives it the "normal" value like semantics of the other C++ containers.
You should not notice any difference in runtime performance while you still get to enjoy the extra features.
Using std::array instead of int[] style arrays is a good idea if you have C++11 or boost at hand.

Is it more performant ?
It should be exactly the same. By definition, it's a simple aggregate containing an array as its only member.
The situation seems to be more complicated, as std::array does not always produce identical assembly code compared to C-array depending on the specific platform.
I tested this specific situation on godbolt:
#include <array>
void test(double* const C, const double* const A,
const double* const B, const size_t size) {
for (size_t i = 0; i < size; i++) {
//double arr[2] = {0.e0};//
std::array<double, 2> arr = {0.e0};//different to double arr[2] for some compiler
for (size_t j = 0; j < size; j++) {
arr[0] += A[i] * B[j];
arr[1] += A[j] * B[i];
}
C[i] += arr[0];
C[i] += arr[1];
}
}
GCC and Clang produce identical assembly code for both the C-array version and the std::array version.
MSVC and ICPC, however, produce different assembly code for each array version. (I tested ICPC19 with -Ofast and -Os; MSVC -Ox and -Os)
I have no idea, why this is the case (I would indeed expect exactly identical behavior of std::array and c-array). Maybe there are different optimization strategies employed.
As a little extra:
There seems to be a bug in ICPC with
#pragma simd
for vectorization when using the c-array in some situations
(the c-array code produces a wrong output; the std::array version works fine).
Unfortunately, I do not have a minimal working example for that yet, since I discovered that problem while optimizing a quite complicated piece of code.
I will file a bug-report to intel when I am sure that I did not just misunderstood something about C-array/std::array and #pragma simd.

std::array has value semantics while raw arrays do not. This means you can copy std::array and treat it like a primitive value. You can receive them by value or reference as function arguments and you can return them by value.
If you never copy a std::array, then there is no performance difference than a raw array. If you do need to make copies then std::array will do the right thing and should still give equal performance.

You will get the same perfomance results using std::array and c array
If you run these code:
std::array<QPair<int, int>, 9> *m_array=new std::array<QPair<int, int>, 9>();
QPair<int, int> *carr=new QPair<int, int>[10];
QElapsedTimer timer;
timer.start();
for (int j=0; j<1000000000; j++)
{
for (int i=0; i<9; i++)
{
m_array->operator[](i).first=i+j;
m_array->operator[](i).second=j-i;
}
}
qDebug() << "std::array<QPair<int, int>" << timer.elapsed() << "milliseconds";
timer.start();
for (int j=0; j<1000000000; j++)
{
for (int i=0; i<9; i++)
{
carr[i].first=i+j;
carr[i].second=j-i;
}
}
qDebug() << "QPair<int, int> took" << timer.elapsed() << "milliseconds";
return 0;
You will get these results:
std::array<QPair<int, int> 5670 milliseconds
QPair<int, int> took 5638 milliseconds
Mike Seymour is right, if you can use std::array you should use it.

std::iterator, pointers and VC++ warning C4996

int *arr = (int*) malloc(100*sizeof(int));
int *arr_copy = (int*) malloc(100*sizeof(int));
srand(123456789L);
for( int i = 0; i < 100; i++) {
arr[i] = rand();
arr_copy[i] = arr[i];
}
// ------ do stuff with arr ------
// reset arr...
std::copy(arr_copy, arr_copy+100, arr);
while compiling this I get this warning for std::copy():
c:\program files (x86)\microsoft visual studio 10.0\vc\include\xutility(2227):
warning C4996: 'std::_Copy_impl': Function call with parameters that may be
unsafe - this call relies on the caller to check that the passed values are
correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See
documentation on how to use Visual C++ 'Checked Iterators'
I know how to disable/ignore the warning, but is there is a simple one liner solution to make a "checked iterator" out of an unchecked pointer? Something like (I know cout is not an unchecked pointer like int*, but just e.g.):
ostream_iterator<int> out(cout," ");
std::copy(arr_copy, arr_copy+numElements, out);
I don't want to write a whole new specialized class my_int_arr_output_iterator : iterator.... But can I use one of the existing iterators?
---edit---
As there are many many questions abt my usage of c-style-arrays and malloc instead of STL containers, let me just say that I'm writing a small program to test different sorting algorithms' performance and memory usage. The code snippet you see above is a specialized (original code is template class with multiple methods, testing one algorithm for different number of elements in arrays of different types) version specific to the problem.
In other words, I do know how to do this using STL containers (vector) and their iterators (vector::begin/end). What I didn't know is what I asked.
Thanks though, hopefully someone else would benefit from the answers if not me.

The direct answer you're looking for is the stdext::checked_array_iterator. This can be used to wrap a pointer and it's length into a MSVC checked_iterator.
std::copy(arr_copy, arr_copy+100, stdext::checked_array_iterator<int*>(arr, 100) );
They also provide a stdext::checked_iterator which can wrap a non-checked container.

This is a "Mother, may I" warning: the code is correct, but the library writer thinks you're not smart enough to handle it. Turn off stupid warnings.

Here's one:
std::vector<int> arr(100);
std::vector<int> arr_copy(100);
srand(123456789L);
for( int i = 0; i < 100; i++) {
arr[i] = rand();
arr_copy[i] = arr[i];
}
//do stuff
std::copy(arr_copy.begin(), arr_copy.end(), arr.begin());

There is a limited portable solution to this problem.
It can be done with help of boost::filter_iterator adapter.
There are two limitations:
The iterator is bidirectional without random access. it++ and it-- work but it+=10 doesn't.
it=end(); int val = *it; is not checked and will assign garbage to val. It is only for the element past the last. Other iterator values will be checked. To workaround this limitation, I would always advance the iterator after using its value. So after consuming the last value it would point to end(). Then it=end()-1; int val1 = *it++; int val2 = *it++; // segfault or failing assert on this line. Ether way the error will not go unnoticed.
The solution:
filter_iterator uses a user defined predicate to control which elements are skipped. We can define our predicate that will not skip elements but it will assert if the iterator is out of range in debug mode. There will be no penalty to performance because when in release mode the predicate will only return true and it will be simplified out by the compiler. Below is the code:
// only header is required
#include "boost/iterator/filter_iterator.hpp"
// ...
const int arr[] = {1, 2, 3, 4, 5};
const int length = sizeof(arr)/sizeof(int);
const int *begin = arr;
const int *end = arr + length;
auto range_check = [begin, end](const int &t)
{
assert(&t >= begin && &t < end );
return true;
};
typedef boost::filter_iterator<decltype(range_check), const int *> CheckedIt;
std::vector<int> buffer;
std::back_insert_iterator<std::vector<int>> target_it(buffer);
std::copy(CheckedIt(range_check, begin, end), CheckedIt(range_check, end, end), target_it);
for(auto c : buffer)
std::cout << c << std::endl;
auto it = CheckedIt(range_check, begin, end);
it--; // assertion fails
auto it_end = CheckedIt(range_check, end-1, end);
it ++;
std::cout << *it; // garbage out
it ++; // assertion fails.

For portability you could use
template <class T>
T* cloneArray(T *a, int length) {
T *b = new T[length];
for (int i = 0; i < length; i++) b[i] = a[i];
return b;
}
You can tweak it to change the behaviour to copy one array to another.

How do I deal with "signed/unsigned mismatch" warnings (C4018)?

I work with a lot of calculation code written in c++ with high-performance and low memory overhead in mind. It uses STL containers (mostly std::vector) a lot, and iterates over that containers almost in every single function.
The iterating code looks like this:
for (int i = 0; i < things.size(); ++i)
{
// ...
}
But it produces the signed/unsigned mismatch warning (C4018 in Visual Studio).
Replacing int with some unsigned type is a problem because we frequently use OpenMP pragmas, and it requires the counter to be int.
I'm about to suppress the (hundreds of) warnings, but I'm afraid I've missed some elegant solution to the problem.
On iterators. I think iterators are great when applied in appropriate places. The code I'm working with will never change random-access containers into std::list or something (so iterating with int i is already container agnostic), and will always need the current index. And all the additional code you need to type (iterator itself and the index) just complicates matters and obfuscates the simplicity of the underlying code.

It's all in your things.size() type. It isn't int, but size_t (it exists in C++, not in C) which equals to some "usual" unsigned type, i.e. unsigned int for x86_32.
Operator "less" (<) cannot be applied to two operands of different sign. There's just no such opcodes, and standard doesn't specify, whether compiler can make implicit sign conversion. So it just treats signed number as unsigned and emits that warning.
It would be correct to write it like
for (size_t i = 0; i < things.size(); ++i) { /**/ }
or even faster
for (size_t i = 0, ilen = things.size(); i < ilen; ++i) { /**/ }

Ideally, I would use a construct like this instead:
for (std::vector<your_type>::const_iterator i = things.begin(); i != things.end(); ++i)
{
// if you ever need the distance, you may call std::distance
// it won't cause any overhead because the compiler will likely optimize the call
size_t distance = std::distance(things.begin(), i);
}
This a has the neat advantage that your code suddenly becomes container agnostic.
And regarding your problem, if some library you use requires you to use int where an unsigned int would better fit, their API is messy. Anyway, if you are sure that those int are always positive, you may just do:
int int_distance = static_cast<int>(distance);
Which will specify clearly your intent to the compiler: it won't bug you with warnings anymore.

If you can't/won't use iterators and if you can't/won't use std::size_t for the loop index, make a .size() to int conversion function that documents the assumption and does the conversion explicitly to silence the compiler warning.
#include <cassert>
#include <cstddef>
#include <limits>
// When using int loop indexes, use size_as_int(container) instead of
// container.size() in order to document the inherent assumption that the size
// of the container can be represented by an int.
template <typename ContainerType>
/* constexpr */ int size_as_int(const ContainerType &c) {
const auto size = c.size(); // if no auto, use `typename ContainerType::size_type`
assert(size <= static_cast<std::size_t>(std::numeric_limits<int>::max()));
return static_cast<int>(size);
}
Then you write your loops like this:
for (int i = 0; i < size_as_int(things); ++i) { ... }
The instantiation of this function template will almost certainly be inlined. In debug builds, the assumption will be checked. In release builds, it won't be and the code will be as fast as if you called size() directly. Neither version will produce a compiler warning, and it's only a slight modification to the idiomatic loop.
If you want to catch assumption failures in the release version as well, you can replace the assertion with an if statement that throws something like std::out_of_range("container size exceeds range of int").
Note that this solves both the signed/unsigned comparison as well as the potential sizeof(int) != sizeof(Container::size_type) problem. You can leave all your warnings enabled and use them to catch real bugs in other parts of your code.

You can use:
size_t type, to remove warning messages
iterators + distance (like are first hint)
only iterators
function object
For example:
// simple class who output his value
class ConsoleOutput
{
public:
ConsoleOutput(int value):m_value(value) { }
int Value() const { return m_value; }
private:
int m_value;
};
// functional object
class Predicat
{
public:
void operator()(ConsoleOutput const& item)
{
std::cout << item.Value() << std::endl;
}
};
void main()
{
// fill list
std::vector<ConsoleOutput> list;
list.push_back(ConsoleOutput(1));
list.push_back(ConsoleOutput(8));
// 1) using size_t
for (size_t i = 0; i < list.size(); ++i)
{
std::cout << list.at(i).Value() << std::endl;
}
// 2) iterators + distance, for std::distance only non const iterators
std::vector<ConsoleOutput>::iterator itDistance = list.begin(), endDistance = list.end();
for ( ; itDistance != endDistance; ++itDistance)
{
// int or size_t
int const position = static_cast<int>(std::distance(list.begin(), itDistance));
std::cout << list.at(position).Value() << std::endl;
}
// 3) iterators
std::vector<ConsoleOutput>::const_iterator it = list.begin(), end = list.end();
for ( ; it != end; ++it)
{
std::cout << (*it).Value() << std::endl;
}
// 4) functional objects
std::for_each(list.begin(), list.end(), Predicat());
}

C++20 has now std::cmp_less
In c++20, we have the standard constexpr functions
std::cmp_equal
std::cmp_not_equal
std::cmp_less
std::cmp_greater
std::cmp_less_equal
std::cmp_greater_equal
added in the <utility> header, exactly for this kind of scenarios.
Compare the values of two integers t and u. Unlike builtin comparison operators, negative signed integers always compare less than (and not equal to) unsigned integers: the comparison is safe against lossy integer conversion.
That means, if (due to some wired reasons) one must use the i as integer, the loops, and needs to compare with the unsigned integer, that can be done:
#include <utility> // std::cmp_less
for (int i = 0; std::cmp_less(i, things.size()); ++i)
{
// ...
}
This also covers the case, if we mistakenly static_cast the -1 (i.e. int)to unsigned int. That means, the following will not give you an error:
static_assert(1u < -1);
But the usage of std::cmp_less will
static_assert(std::cmp_less(1u, -1)); // error

I can also propose following solution for C++11.
for (auto p = 0U; p < sys.size(); p++) {
}
(C++ is not smart enough for auto p = 0, so I have to put p = 0U....)

I will give you a better idea
for(decltype(things.size()) i = 0; i < things.size(); i++){
//...
}
decltype is
Inspects the declared type of an entity or the type and value category
of an expression.
So, It deduces type of things.size() and i will be a type as same as things.size(). So,
i < things.size() will be executed without any warning

I had a similar problem. Using size_t was not working. I tried the other one which worked for me. (as below)
for(int i = things.size()-1;i>=0;i--)
{
//...
}

I would just do
int pnSize = primeNumber.size();
for (int i = 0; i < pnSize; i++)
cout << primeNumber[i] << ' ';

newbie question about pointer to stl

I have written this function
vector<long int>* randIntSequence(long int n) {
vector<long int> *buffer = new vector<long int>(n, 0);
for(long int i = 0; i < n; i++)
buffer->at(i);
long int j; MTRand myrand;
for(long int i = buffer->size() - 1; i >= 1; i--) {
j = myrand.randInt(i);
swap(buffer[i], buffer[j]);
}
return buffer;
}
but when I call it from main, myvec = randIntSequence(10), I see the myvector always empty. Shall I modify the return value?

The swap call is indexing the *buffer pointer as if it were an array and is swapping around pointers. You mean to swap around the items of the vector. Try this modification:
swap((*buffer)[i], (*buffer)[j]);
Secondary to that, your at calls don't set the values as you expect. You are pulling out the items in the vector but not setting them to anything. Try one of these statements:
buffer->at(i) = i;
(*buffer)[i] = i;

You never assign to any of the elements in the vector pointed to by buffer:
for (long int i = 0; i < n; i++)
buffer->at(i); // do you mean to assign something here?
You end up with the vector containing n zeroes.

Your question has already been answered, so I'll make this CW, but this is how your code should look.
std::vector<long int> randIntSequence(long int n)
{
std::vector<long int> buffer(n);
for(int i=0; i<n; ++i)
buffer[i] = i;
std::random_shuffle(buffer.begin(), buffer.end());
return buffer;
}
There is absolutely no reason you should be using a pointer here. And unless you have some more advanced method of random shuffling, you should be using std::random_shuffle. You might also consider using boost::counting_iterator to initialize the vector:
std::vector<long int> buffer(
boost::counting_iterator<long int>(0),
boost::counting_iterator<long int>(n));
Though that may be overkill.

Since the question is about STL, and all you want is a vector with random entries then:
std::vector<long int> v(10);
generate( v.begin(), v.end(), std::rand ); // range is [0,RAND_MAX]
// or if you provide long int MTRand::operator()()
generate( v.begin(), v.end(), MTRand() );
But if you want to fix your function then
n should be size_t not long int
First loop is no-op
As John is saying, buffer is a pointer, so buffer[0] is your vector, and buffer[i] for i!=0 is garbage. It seems you have been very lucky to get a zero-sized vector back instead of a corrupt one!
Is your intention to do random shuffle? If yes, you are shuffling around zeros. If you just want to generate random entries then why don't you just loop the vector (from 0 to buffer->size(), not the other way around!!) and assign your random number?
C++ is not garbage collected, and you probably don't want smart pointers for such simple stuff, so you'll be sure to end up with leaks. If the reason is in generating a heap vector and returning by pointer is avoiding a copy for performance's sake, then my advise is don't do it! The following is the (almost) perfect alternative, both for clarity and for performance:
vector<T> randIntSequence( size_t n ) {
vector<T> buffer(n);
// bla-bla
return buffer;
}
If you think there is excess copying around in here, read this and trust your compiler.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ self-enforcing a standard: size_t - c++

Related

Comparison between signed and unsigned. Is static_cast the only solution?

std::array vs array performance

std::iterator, pointers and VC++ warning C4996

How do I deal with "signed/unsigned mismatch" warnings (C4018)?

newbie question about pointer to stl

Categories

Resources