The following function compare two arrays, and returns true if all elements are equal taking in account a tolerance.
// Equal
template<typename Type>
bool eq(const unsigned int n, const Type* x, const Type* y, const Type tolerance)
{
bool ok = true;
for(unsigned int i = 0; i < n; ++i) {
if (std::abs(x[i]-y[i]) > std::abs(tolerance)) {
ok = false;
break;
}
}
return ok;
}
Is there a way to beat the performances of this function ?
Compute abs(tolerance) outside the loop.
You might try unrolling the loop into a 'major' loop and a 'minor' loop where the 'minor' loop's only jump is to its beginning and the 'major' loop has the 'if' and 'break' stuff. Do something like ok &= (x[i]-y[i] < abstol) & (y[i]-x[i] < abstol); in the minor loop to avoid branching -- note & instead of &&.
Then partially unroll and vectorise the minor loop. Then specialise for whatever floating-point types you're actually using and use your platform's SIMD instructions to do the minor loop.
Think before doing this, of course, since it can increase code size and thereby have ill effects on maintainability and sometimes the performance of other parts of your system.
You can avoid those return variable assignments, and precalculate the absolute value of tolerance:
// Equal
template<typename Type>
bool eq(const unsigned int n, const Type* x, const Type* y, const Type tolerance) {
const Type absTolerance = std::abs(tolerance);
for(unsigned int i = 0; i < n; ++i) {
if (std::abs(x[i]-y[i]) > absTolerance) {
return false;
}
}
return true;
}
Also, if you know the tolerance will be always possitive there's no need to calculate its absolute value. If not, you may take it as a precondition.
I would do it like this, you can roll a C++03 version with class functors also, it will be more verbose but should be equally efficient:
std::equal(x, x+n, y, [&tolerance](Type a, Type b) -> bool { return ((a-b) < tolerance) && ((a-b) > -tolerance); }
Major difference is dropping the abs: depending on Type and how abs is implemented you might get a conditional execution path extra, with lots of branch mispredictions, this should certainly avoid that. The duplicate calculation of a-b will likely be optimized away by the compiler (if it deems necessary).
Of course, it introduces an extra operator requirement for Type and if operators < or > are slow, it might be slower then abs (measure it).
Also, std::equal is a standard algorithm doing all that looping and early breaking for you, it's always a good idea to use a standard library for this. It's usually nicer to maintain (in C++11 at least) and could get optimized better because you clearly show intent.
Related
I would like to use a class with the same functionality as std::vector, but
Replace std::vector<T>::size_type by some signed integer (like int64_t or simply int), instead of usual size_t. It is very annoying to see warnings produced by a compiler in comparisons between signed and unsigned numbers when I use standard vector interface. I can't just disable such warnings, because they really help to catch programming errors.
put assert(0 <= i && i < size()); inside operator[](int i) to check out of range errors. As I understand it will be a better option over the call to .at() because I can disable assertions in release builds, so performance will be the same as in the standard implementation of the vector. It is almost impossible for me to use std::vector without manual checking of range before each operation because operator[] is the source of almost all weird errors related to memory access.
The possible options that come to my mind are to
Inherit from std::vector. It is not a good idea, as said in the following question: Extending the C++ Standard Library by inheritance?.
Use composition (put std::vector inside my class) and repeat all the interface of std::vector. This option forces me to think about the current C++ standard, because the interface of some methods, iterators is slightly different in C++ 98,11,14,17. I would like to be sure, that when c++ 20 became available, I can simply use it without reimplementation of all the interface of my vector.
An answer more to the underlying problem read from the comment:
For example, I don't know how to write in a ranged-based for way:
for (int i = a.size() - 2; i >= 0; i--) { a[i] = 2 * a[i+1]; }
You may change it to a generic one like this:
std::vector<int> vec1{ 1,2,3,4,5,6};
std::vector<int> vec2 = vec1;
int main()
{
// generic code
for ( auto it = vec1.rbegin()+1; it != vec1.rend(); it++ )
{
*it= 2* *(it-1);
}
// your code
for (int i = vec2.size() - 2; i >= 0; i--)
{
vec2[i] = 2 * vec2[i+1];
}
for ( auto& el: vec1) { std::cout << el << std::endl; }
for ( auto& el: vec2) { std::cout << el << std::endl; }
}
Not using range based for as it is not able to access relative to the position.
Regarding point 1: we hardly ever get those warnings here, because we use vectors' size_type where appropriate and/or cast to it if needed (with a 'checked' cast like boost::numeric_cast for safety). Is that not an option for you? Otherwise, write a function to do it for you, i.e. the non-const version would be something like
template<class T>
T& ati(std::vector<T>& v, std::int64_t i)
{
return v.at(checked_cast<decltype(v)::size_type>(i));
}
And yes, inheriting is still a problem. And even if it weren't you'd break the definition of vector (and the Liskov substitution principle I guess), because the size_type is defined as
an unsigned integral type that can represent any non-negative value of difference_type
So it's down to composition, or a bunch of free functions for accessing with a signed size_type and a range check. Personally I'd go for the latter: less work, as easy to use, and you can still pass your vector to functions teaking vectors without problems.
(This is more a comment than a real answer, but has some code, so...)
For the second part (range checking at runtime), a third option would be to use some macro trick:
#ifdef DEBUG
#define VECTOR_AT(v,i) v.at(i)
#else
#define VECTOR_AT(v,i) v[i]
#endif
This can be used this way:
std::vector<sometype> vect(somesize);
VECTOR_AT(vect,i) = somevalue;
Of course, this requires editing your code in a quite non-standard way, which may not be an option. But it does the job.
I have two arrays comprising x,y vales for y=f(x). I would like to provide a function that finds the value of x that corresponds to either the min or max sampled value of y.
What is an efficient way to select proper comparison operator before looping over the values in the arrays?
For example, I would like to do something like the following:
double FindExtremum(const double* x, const double* y,
const unsigned int n, const bool isMin) {
static std::less<double> lt;
static std::greater<double> gt;
std::binary_function<double,double,bool>& IsBeyond = isMin ? lt : gt;
double xm(*x), ym(*y);
for (unsigned int i=0; i<n; ++i, ++x, ++y) {
if (IsBeyond()(*y,ym)) {
ym = *y;
xm = *x;
}
}
}
Unfortunately, the base class std::binary_function does not define a virtual operator().
Will a compiler like g++ 4.8 be able to optimize the most straight forward implementation?
double FindExtremum(const double* x, const double* y,
const unsigned int n, const bool isMin) {
double xm(*x), ym(*y);
for (unsigned int i=0; i<n; ++i, ++x, ++y) {
if ( ( isMin && (*y<ym)) ||
(!isMin && (*y>ym)) ) {
ym = *y;
xm = *x;
}
}
}
Is there another way to arrange things to make it easy for the compiler to optimize? Is there a well known algorithm for doing this?
I would prefer to avoid using a templated function, if possible.
You would need to pass the comparison functor as a templated function parameter, e.g.
template <typename Compare>
double FindExtremum(const double* x, const double* y,
const unsigned int n, Compare compare) {
double xm(*x), ym(*y);
for (unsigned int i=0; i<n; ++i, ++x, ++y) {
if (compare(*y,ym)) {
ym = *y;
xm = *x;
}
}
}
Then if you need runtime choice, write something like this:
if (isMin) {
FindExtremum(x, y, n, std::less<double>());
} else {
FindExtremum(x, y, n, std::greater<double>());
}
Avoiding a templated function is not really possible in this case. The best performing code will be one that embeds the comparison operation directly in the loop, avoiding a function call - you can either write a template or write two copies of this function. A templated function is clearly the better solution.
For ultimate efficiency, make the comparison operator or the comparison operator choice a template parameter, and don't forget to measure.
When striving for utmost micro-efficiency, doing virtual calls is not in the direction of the goal.
That said, this is most likely a case of premature optimization, which Donald Knuth described thusly:
“Premature optimization is the root of all evil”
(I omitted his reservations, it sounds more forceful that way. :-) )
Instead of engaging in micro-optimization frenzy, which gains you little if anything, and wastes your time, I recommend more productively trying to make the code as clear and provably correct as possible. For example, use std::vector instead of raw arrays and separately passed sizes. And, for example, don't call the boolean comparison operator compare, as recommended in another answer, since that's the conventional name for tri-valued compare (e.g. as in std::string::compare).
Some questions arise here. First, I think you're overcomplicating the situation. For example, it would be easier to have two functions, one that calculates the min and other that calculates the max, and then call either of them depending on the value of isMin.
More so, note how in each iteration you're making the test to see if isMin is true or not, (at least in the "optimized" code you show last) and that comparison could have been done just once.
Now, if isMin can be deduced in any way at compile time, you can use a template class that selects the correct implementation optimized for the case, and without any run-time overhead (not tested, written from memory):
template<bool isMin>
class ExtremeFinder
{
static float FindExtreme(const double* x, const double* y,
const unsigned int n)
{
// Version that calculates when isMin is false
}
};
template<>
class ExtremeFinder<true>
static float FindExtreme(const double* x, const double* y,
const unsigned int n)
{
// Version that calculates when isMin is true
}
};
and call it as ExtremeFinder<test_to_know_isMin>::FindExtreme(...);, or, if you cannot decide it at compile time, you can always do:
if (isMin_should_be_true)
ExtremeFinder<true>::FindExtreme(...);
else
ExtremeFinder<false>::FindExtreme(...);
If you had 2 disjunct criteria, e.g. < and >=, you could use a bool less function argument and use XOR in loop:
if (less ^ (a>=b))
Don't know about performance, but is easy to write.
Or not-covering-all-possibilities-disjunct < and >:
if ( (a!=b) && (less ^ (a>b) )
I am considering using c++ for a performance critical application. I thought both C and C++ will have comparable running times. However i see that the c++ function takes >4 times to run that the comparable C snippet.
When i did a disassemble i saw the end(), ++, != were all implemented as function calls. Is it possible to make them (at least some of them) inline?
Here is the C++ code:
typedef struct pfx_s {
unsigned int start;
unsigned int end;
unsigned int count;
} pfx_t;
typedef std::list<pfx_t *> pfx_list_t;
int
eval_one_pkt (pfx_list_t *cfg, unsigned int ip_addr)
{
const_list_iter_t iter;
for (iter = cfg->begin(); iter != cfg->end(); iter++) {
if (((*iter)->start <= ip_addr) &&
((*iter)->end >= ip_addr)) {
(*iter)->count++;
return 1;
}
}
return 0;
}
And this is the equivalent C code:
int
eval_one_pkt (cfg_t *cfg, unsigned int ip_addr)
{
pfx_t *pfx;
TAILQ_FOREACH (pfx, &cfg->pfx_head, next) {
if ((pfx->start <= ip_addr) &&
(pfx->end >= ip_addr)) {
pfx->count++;
return 1;
}
}
return 0;
}
It might be worth noting that the data structures you used are not entirely equivalent. Your C list is implemented as a list of immediate elements. Your C++ list is implemented as a list of pointers to actual elements. Why did you make your C++ list a list of pointers?
This alone will not, of course, cause four-fold difference in performance. However, it could affect the code's performance do to its worse memory locality.
I would guess that you timed debug version of your code, maybe even compiled with debug version of the library.
Do you have a really good reason to use a list here at all? At first glance, it looks like a std::vector will be a better choice. You probably also don't want a container of pointers, just a container of objects.
You can also do the job quite a bit more neatly a standard algorithm:
typedef std::vector<pfx_t> pfx_list_t;
int
eval_one_pkt(pfx_list_t const &cfg, unsigned int ip_addr) {
auto pos = std::find_if(cfg.begin(), cfg.end(),
[ip_addr](pfx_t const &p) {
return ip_addr >= p.begin && ip_addr <= p.end;
});
if (pos != cfg.end()) {
++(pos->count);
return 1;
}
return 0;
}
If I were doing it, however, I'd probably turn that into generic algorithm instead:
template <class InIter>
int
eval_one_pkt(InIter b, InIter e, unsigned int ip_addr) {
auto pos = std::find_if(b, e,
[ip_addr](pfx_t const &p) {
return ip_addr >= p.begin && ip_addr <= p.end;
});
if (pos != cfg.end()) {
++(pos->count);
return 1;
}
return 0;
}
Though unrelated to C vs. C++, for a possible slight further optimization on the range check you might want to try something like this:
return ((unsigned)(ip_addr-p.begin) <= (p.end-p.begin));
With a modern compiler with optimization enabled, I'd expect the template to be expanded inline entirely at the point of use, so there probably wouldn't be any function calls involved at all.
I copied your code and ran timings of 10,000 failed (thus complete) searches of 10,000 element lists:
Without optimization:
TAILQ_FOREACH 0.717s
std::list<pfx_t *> 2.397s
std::list<pfx_t> 1.98s
(Note that I put a next into pfx_t for TAILQ and used the same redundant structure with std::list)
You can see that lists of pointers is worse than lists of objects. Now with optimization:
TAILQ_FOREACH 0.467s
std::list<pfx_t *> 0.553s
std::list<pfx_t> 0.345s
So as everyone pointed out, optimization is the dominant term in a tight inner loop using collection types. Even the slowest variation is faster than the fastest unoptimized version. Perhaps more surprising is that the winner changes -- this is likely due to the compiler better recognizing optimization opportunities in the std code than in an OS-provided macro.
// All right? This is really good working code?
//Need init array with value "false"
bool **Madj;
int NodeCount=4;
bool **Madj = new bool*[NodeCount];
for (int i=0; i<NodeCount; i++){
Madj[i] = new bool [NodeCount];
for (int j=0; j<NodeCount; j++){
Madj[i][j] = false;
}
}
You could consider using Boost's builtin multi-dimensional array as a less brittle alternative. As noted the code you supplied will work, but has issues.
What about:
std::vector<std::vector<bool> > Madj(4,std:vector<bool>(4, false));
Unfortunately std::vector<bool> is specialized to optimize for size (not speed).
So it can be inefficient (especially if used a lot). So you could use an int array (if you find the bool version is slowing you down).
std::vector<std::vector<int> > Madj(4,std:vector<int>(4, 0));
Note: int can be used in a boolean context and auto converted (0 => false, any other number is true (though best to use 1).
At least IMO, if you insist on doing this at all, you should normally do it rather differently, something like:
class bool_array {
bool *data_;
size_t width_;
// no assignment or copying
bool_array &operator=();
bool_array(bool_array const &);
public:
bool_array(size_t x, size_t y) width_(x) {
data_ = new bool[x*y];
std::fill_n(data_, x*y, false);
}
bool &operator()(size_t x, size_t y) {
return data_[y+width_+x];
}
~bool_array() { delete [] data_; }
};
This can be embellished (e.g., using a proxy to enforce constness), but the general idea remains: 1) allocate your bools in a single block, and 2) put them into a class, and 3) overload an operator to support reasonably clean indexing into the data.
You should also consider using std::vector<bool>. Unlike other instantiations of std::vector, it's not a container (as the standard defines that term), which can be confusing -- but what you're creating isn't a container either, so that apparently doesn't matter to you.
bool **Madj = new bool*[NodeCount];
for (int i=0; i<NodeCount; i++){
Madj[i] = new bool [NodeCount];
for (int j=0; j<NodeCount; j++){
Madj[i][j] = false;
}
}
If the first call to new succeeds but any of the ones in the loop fails, you have a memory leak since Madj and the subarrays up to the current i are not deleted. Use a vector<vector<bool> >, or a vector<bool> of size NodeCount * NodeCount. With the latter option, you can get to element (i,j) with [i*NodeCount+j].
I think this looks fine!
Depending on the use, you could use std::vector instead of a raw array.
But its true that the first Madj declaration should be "extern" to avoid linking or shadowing errors.
If you have only bools, consider using bitsets. You can combine that with other containers for multidimensional arrays, e.g vector<bitset>.
I myself am convinced that in a project I'm working on signed integers are the best choice in the majority of cases, even though the value contained within can never be negative. (Simpler reverse for loops, less chance for bugs, etc., in particular for integers which can only hold values between 0 and, say, 20, anyway.)
The majority of the places where this goes wrong is a simple iteration of a std::vector, often this used to be an array in the past and has been changed to a std::vector later. So these loops generally look like this:
for (int i = 0; i < someVector.size(); ++i) { /* do stuff */ }
Because this pattern is used so often, the amount of compiler warning spam about this comparison between signed and unsigned type tends to hide more useful warnings. Note that we definitely do not have vectors with more then INT_MAX elements, and note that until now we used two ways to fix compiler warning:
for (unsigned i = 0; i < someVector.size(); ++i) { /*do stuff*/ }
This usually works but might silently break if the loop contains any code like 'if (i-1 >= 0) ...', etc.
for (int i = 0; i < static_cast<int>(someVector.size()); ++i) { /*do stuff*/ }
This change does not have any side effects, but it does make the loop a lot less readable. (And it's more typing.)
So I came up with the following idea:
template <typename T> struct vector : public std::vector<T>
{
typedef std::vector<T> base;
int size() const { return base::size(); }
int max_size() const { return base::max_size(); }
int capacity() const { return base::capacity(); }
vector() : base() {}
vector(int n) : base(n) {}
vector(int n, const T& t) : base(n, t) {}
vector(const base& other) : base(other) {}
};
template <typename Key, typename Data> struct map : public std::map<Key, Data>
{
typedef std::map<Key, Data> base;
typedef typename base::key_compare key_compare;
int size() const { return base::size(); }
int max_size() const { return base::max_size(); }
int erase(const Key& k) { return base::erase(k); }
int count(const Key& k) { return base::count(k); }
map() : base() {}
map(const key_compare& comp) : base(comp) {}
template <class InputIterator> map(InputIterator f, InputIterator l) : base(f, l) {}
template <class InputIterator> map(InputIterator f, InputIterator l, const key_compare& comp) : base(f, l, comp) {}
map(const base& other) : base(other) {}
};
// TODO: similar code for other container types
What you see is basically the STL classes with the methods which return size_type overridden to return just 'int'. The constructors are needed because these aren't inherited.
What would you think of this as a developer, if you'd see a solution like this in an existing codebase?
Would you think 'whaa, they're redefining the STL, what a huge WTF!', or would you think this is a nice simple solution to prevent bugs and increase readability. Or maybe you'd rather see we had spent (half) a day or so on changing all these loops to use std::vector<>::iterator?
(In particular if this solution was combined with banning the use of unsigned types for anything but raw data (e.g. unsigned char) and bit masks.)
Don't derive publicly from STL containers. They have nonvirtual destructors which invokes undefined behaviour if anyone deletes one of your objects through a pointer-to base. If you must derive e.g. from a vector, do it privately and expose the parts you need to expose with using declarations.
Here, I'd just use a size_t as the loop variable. It's simple and readable. The poster who commented that using an int index exposes you as a n00b is correct. However, using an iterator to loop over a vector exposes you as a slightly more experienced n00b - one who doesn't realize that the subscript operator for vector is constant time. (vector<T>::size_type is accurate, but needlessly verbose IMO).
While I don't think "use iterators, otherwise you look n00b" is a good solution to the problem, deriving from std::vector appears much worse than that.
First, developers do expect vector to be std:.vector, and map to be std::map. Second, your solution does not scale for other containers, or for other classes/libraries that interact with containers.
Yes, iterators are ugly, iterator loops are not very well readable, and typedefs only cover up the mess. But at least, they do scale, and they are the canonical solution.
My solution? an stl-for-each macro. That is not without problems (mainly, it is a macro, yuck), but it gets across the meaning. It is not as advanced as e.g. this one, but does the job.
I made this community wiki... Please edit it. I don't agree with the advice against "int" anymore. I now see it as not bad.
Yes, i agree with Richard. You should never use 'int' as the counting variable in a loop like those. The following is how you might want to do various loops using indices (althought there is little reason to, occasionally this can be useful).
Forward
for(std::vector<int>::size_type i = 0; i < someVector.size(); i++) {
/* ... */
}
Backward
You can do this, which is perfectly defined behaivor:
for(std::vector<int>::size_type i = someVector.size() - 1;
i != (std::vector<int>::size_type) -1; i--) {
/* ... */
}
Soon, with c++1x (next C++ version) coming along nicely, you can do it like this:
for(auto i = someVector.size() - 1; i != (decltype(i)) -1; i--) {
/* ... */
}
Decrementing below 0 will cause i to wrap around, because it is unsigned.
But unsigned will make bugs slurp in
That should never be an argument to make it the wrong way (using 'int').
Why not use std::size_t above?
The C++ Standard defines in 23.1 p5 Container Requirements, that T::size_type , for T being some Container, that this type is some implementation defined unsigned integral type. Now, using std::size_t for i above will let bugs slurp in silently. If T::size_type is less or greater than std::size_t, then it will overflow i, or not even get up to (std::size_t)-1 if someVector.size() == 0. Likewise, the condition of the loop would have been broken completely.
Definitely use an iterator. Soon you will be able to use the 'auto' type, for better readability (one of your concerns) like this:
for (auto i = someVector.begin();
i != someVector.end();
++i)
Skip the index
The easiest approach is to sidestep the problem by using iterators, range-based for loops, or algorithms:
for (auto it = begin(v); it != end(v); ++it) { ... }
for (const auto &x : v) { ... }
std::for_each(v.begin(), v.end(), ...);
This is a nice solution if you don't actually need the index value. It also handles reverse loops easily.
Use an appropriate unsigned type
Another approach is to use the container's size type.
for (std::vector<T>::size_type i = 0; i < v.size(); ++i) { ... }
You can also use std::size_t (from <cstddef>). There are those who (correctly) point out that std::size_t may not be the same type as std::vector<T>::size_type (though it usually is). You can, however, be assured that the container's size_type will fit in a std::size_t. So everything is fine, unless you use certain styles for reverse loops. My preferred style for a reverse loop is this:
for (std::size_t i = v.size(); i-- > 0; ) { ... }
With this style, you can safely use std::size_t, even if it's a larger type than std::vector<T>::size_type. The style of reverse loops shown in some of the other answers require casting a -1 to exactly the right type and thus cannot use the easier-to-type std::size_t.
Use a signed type (carefully!)
If you really want to use a signed type (or if your style guide practically demands one), like int, then you can use this tiny function template that checks the underlying assumption in debug builds and makes the conversion explicit so that you don't get the compiler warning message:
#include <cassert>
#include <cstddef>
#include <limits>
template <typename ContainerType>
constexpr int size_as_int(const ContainerType &c) {
const auto size = c.size(); // if no auto, use `typename ContainerType::size_type`
assert(size <= static_cast<std::size_t>(std::numeric_limits<int>::max()));
return static_cast<int>(size);
}
Now you can write:
for (int i = 0; i < size_as_int(v); ++i) { ... }
Or reverse loops in the traditional manner:
for (int i = size_as_int(v) - 1; i >= 0; --i) { ... }
The size_as_int trick is only slightly more typing than the loops with the implicit conversions, you get the underlying assumption checked at runtime, you silence the compiler warning with the explicit cast, you get the same speed as non-debug builds because it will almost certainly be inlined, and the optimized object code shouldn't be any larger because the template doesn't do anything the compiler wasn't already doing implicitly.
You're overthinking the problem.
Using a size_t variable is preferable, but if you don't trust your programmers to use unsigned correctly, go with the cast and just deal with the ugliness. Get an intern to change them all and don't worry about it after that. Turn on warnings as errors and no new ones will creep in. Your loops may be "ugly" now, but you can understand that as the consequences of your religious stance on signed versus unsigned.
vector.size() returns a size_t var, so just change int to size_t and it should be fine.
Richard's answer is more correct, except that it's a lot of work for a simple loop.
I notice that people have very different opinions about this subject. I have also an opinion which does not convince others, so it makes sense to search for support by some guru’s, and I found the CPP core guidelines:
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
maintained by Bjarne Stroustrup and Herb Sutter, and their last update, upon which I base the information below, is of April 10, 2022.
Please take a look at the following code rules:
ES.100: Don’t mix signed and unsigned arithmetic
ES.101: Use unsigned types for bit manipulation
ES.102: Use signed types for arithmetic
ES.107: Don’t use unsigned for subscripts, prefer gsl::index
So, supposing that we want to index in a for loop and for some reason the range based for loop is not the appropriate solution, then using an unsigned type is also not the preferred solution. The suggested solution is using gsl::index.
But in case you don’t have gsl around and you don’t want to introduce it, what then?
In that case I would suggest to have a utility template function as suggested by Adrian McCarthy: size_as_int