I have
typedef std::vector<int> IVec;
typedef std::vector<IVec> IMat;
and I would like to know how I can fill an IMat by using std algorithms, ie how to do the following with less code (all the IVecs have the same size) ?
void fill(IMat& mat){
for (int i=0;i<mat.size();i++){
for (int j=0;j<mat[i].size();j++){
mat[i][j] = i*j;
}
}
}
PS: already a way to fill the matrix with a constant would help me. And preferably with pre-C++11 algorithms.
The best solution is the one that you have already implemented. It takes advantage of using i/j as both offsets and as inputs to compute the algorithm.
Standard algorithms will have to use iterators for the elements and maintain counters. This data mirroring as a sure sign of a problem. But it can be done, even on one line if you wanna be fancy:
for_each(mat.begin(), mat.end(), [&](auto& i) { static auto row = 0; auto column = 0; generate(i.begin(), i.end(), [&]() { return row * column++; }); ++row; });
But as stated just cause it could be done doesn't mean that it should be done. The best way to approach this is the for-loop. Even doing it on one line is possible if that's your thing:
for(auto i = 0U;i < mat.size();i++) for(auto j = 0U;j < mat[i].size();j++) mat[i][j] = i*j;
Incidentally my standard algorithm works fine on Clang 3.7.0, gcc 5.1, and on Visual Studio 2015. However previously I used transform rather than generate. And there seem to be some implementation bugs in gcc 5.1 and Visual Studio 2015 with the captures of lambda scope static variables.
I don't know if this is better than a double for loop, but one possible way you could do it using STL in C++11 would be using two for_each as follows:
int i(0);
std::for_each(mat.begin(), mat.end(),
[&i](IVec &ivec){int j(0); std::for_each(ivec.begin(), ivec.end(),
[&i,&j](auto &k){k = i*j++;}); ++i;});
LIVE DEMO
Just thought I'd comment further on Jonathan's excellent answer.
Ignore the c++11 syntax for now and imagine that we had written some supporting classes (doesn't matter how for now).
we could conceivably come up with code like this:
auto main() -> int
{
// define a matrix (vector of vectors)
IMat mat;
// resize it through some previously defined function
resize(mat, 10, 10);
// get an object that is a pseudo-container representing its extent
auto extent = extent_of(mat);
// generate values in the pseudo-container which forwards to the matrix
std::generate(extent.begin(),
extent.end(),
[](auto pxy) { pxy.set_value(pxy.x * pxy.y); });
// or even
for (auto pxy : extent_of(mat)) {
pxy.set_value(product(pxy.coordinates()));
}
return 0;
}
100 lines of supporting code later (iterable containers and their proxies are not trivial) and this would compile and work.
Clever as it undoubtedly would be, there are some problems:
There's the small matter of the 100 extra lines of code.
It seems to me that this code is actually less expressive than yours. i.e. it's immediately obvious what your code is doing. With mine you have to make some assumptions or go and reason about the extra 100 lines of code.
my code needs a lot more maintenance (and documentation) than yours
Sometimes less is more.
Related
I thought about implementing a matrix class that used std::transform from algorithm for calculation but I came across that in some situations it's faster to write loops.
Having a look add operator+= for element wise add. In case the rhs matrix has 1 col while having the same number of rows than the lhs matrix I can do the following:
for (auto c = 0; c < cols(); ++c) {
std::transform(std::execution::par, col_begin(c), col_end(c), rhs.begin(), col_begin(c), std::plus<>());
}
or use simple loops:
auto lhsval = begin();
auto rhsval= rhs.begin();
for (auto r = 0; r < rows(); ++r) {
for (auto c = 0; c < cols(); ++c) {
*lhsval += *rhsval;
++lhsval;
}
++rhsval;
}
For your information, i wrote an iterator that accepts a step. So the col_begin() returns an iterator that will skip other columns in the operator++
I timed the difference between both implementations using google benchmark and came to the conclusion that the loop is about 5 times faster than using std::transform. Well maybe there should be a difference, but not a difference that huge.
You can look at the complete code at my github repo
matrix class
matrix iterator
Passing std::execution::par is asking the library to parallelize this operation. This adds overhead, even if it is just to determine "your problem is too small to parallelize". The number of elements being transformed has to be quite large (sometimes hundreds of thousands or millions) before the parallelization is worthwhile, and requires that you have appropriate hardware (parallelizing on a two-core machine is much less likely to be worth it than on a 64-core machine).
The for loop version is much more similar to plain std::transform without the std::execution::par parameter. If you remove that parameter and the performance difference is still large, please update your question with that information, alongside your compiler version, platform, compiler switches and information about your data set: number of rows/columns, etc.
I am trying to do a product operand on the values inside of a vector. It is a huge mess of code.. I have posted it previously but no one was able to help. I just wanna confirm which is the correct way to do a single part of it. I currently have:
vector<double> taylorNumerator;
for(a = 0; a <= (constant); a++) {
double Number = equation involving a to get numerous values;
taylorNumerator.push_back(Number);
for(b = 0; b <= (constant); b++) {
double NewNumber *= taylorNumerator[b];
}
This is what I have as a snapshot, it is very short from what I actually have. Someone told me it is better to do vector.at(index) instead. Which is the correct or best way to accomplish this? If you so desire I can paste all of the code, it works but the values I get are wrong.
When possible, you should probably avoid using indexes at all. Your options are:
A range-based for loop:
for (auto numerator : taylorNumerators) { ... }
An iterator-based loop:
for (auto it = taylorNumerators.begin(); it != taylorNuemrators.end(); ++it) { ... }
A standard algorithm, perhaps with a lambda:
#include <algorithm>
std::for_each(taylorNumerators, [&](double numerator) { ... });
In particular, note that some algorithms let you specify a number of iterations, like std::generate_n, so you can create exactly n items without counting to n yourself.
If you need the index in the calculation, then it can be appropriate to use a traditional for loop. You have to watch for a couple pitfalls: std::vector<T>::size() returns a std::vector<T>::size_type which is typically identical to std::size_type, which is (1) unsigned and (2) quite possibly larger than an int.
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) { ... }
Your calculations probably deal with doubles or some numerical type other than std::size_t, so you have to consider the best way to convert it. Many programmers would rely on implicit conversions, but that can be dangerous unless you know the conversion rules very well. I'd generally start by doing a static cast of the index to the type I actually need. For example:
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) {
const auto x = static_cast<double>(i);
/* calculation involving x */
}
In C++, it's probably far more common to make sure the index is in range and then use operator[] rather than to use at(). Many projects disable exceptions, so the safety guarantee of at() wouldn't really be available. And, if you can check the range once yourself, then it'll be faster to use operator[] than to rely on the range-check built into at() on each index operation.
What you have is fine. Modern compilers can optimize the heck out of the above such that the code is just as fast as the equivalent C code of accessing items direclty.
The only optimization for using vector I recommend is to invoke taylorNumerator.reserve(constant) to allocate the needed storage upfront instead of the vector resizing itself as new items are added.
About the only worthy optimization after that is to not use vector at all and just use a static array - especially if constant is small enough that it doesn't blow up the stack (or binary size if global).
double taylorNumerator[constant];
I was wondering either it is possible in the c++11 syntax to use the new container based for loop for multiple items, for example:
std::vector<double> x;
std::vector<double> y;
for (double& xp, yp : x, y)
{
std::cout << xp << yp << std::endl;
}
I was not able to find any information about using this loop for more than one container. I would appreciate all help.
Example effect in the classic for loop:
std::vector<double>::iterator itX = m_x.begin();
std::vector<double>::iterator itY = m_y.begin();
for (uint32_t i = 0; i < m_x.size(); i++, itX++, itY++)
{
// operations on the m_x and m_y vectors
}
There is a request in the language working group to support a very similar syntax to iterate simultaneously on many containers:
Section: 6.5.4 [stmt.ranged] Status: Open Submitter: Gabriel Dos Reis
Opened: 2013-01-12 Last modified: 2015-05-22
Discussion:
The new-style 'for' syntax allows us to dispense with administrative iterator declarations when iterating over a single sequence. The burden and noise remain, however, when iterating over two or more sequences simultaneously. We should extend the syntax to allow that. E.g. one should be able to write:
for (auto& x : v; auto& y : w)
a = combine(v, w, a);
instead of the noisier
auto p1 = v.begin();
auto q1 = v.end();
auto p2 = w.begin();
auto q2 = w.end();
while (p1 < q1 and p2 < q2) {
a = combine(*p1, *p2, a);
++p1;
++p2;
}
See http://cplusplus.github.io/EWG/ewg-active.html#43
So it could happen but not in the near future.
Meanwhile the best choice is probably the classical for loop.
I know this is a little old now, but I had a similar query and T.C.'s "you'd need some library help" comment made me grin, because I ended up solving a similar issue with only a few extra characters. To recycle the OP's first example, and assuming as stated that the vectors are guaranteed the same size, you can make C++11 referencing meet old-school pointer arithmetic, like so:
std::vector<double> x;
std::vector<double> y;
for (double &xp:x)
{
std::cout << xp << y[&xp-&x[0]] << std::endl;
}
Probably best not used in production code (I'm a self-taught hobbyist coder, so meh) but it works well enough, is simple, requires no libraries and does not seem to suffer any notable speed penalty (at least, that I've noticed). It even works in minimal C++11 environments (such as VC11). To be clear, I used this in a slightly different context, where (given the above example) I only accessed y[] when a change was actually required (so any possible speed reduction is mitigated in the noise), but since the y[] access was required at the same index as in x[] and required no extra variables/iterators, it fit the bill perfectly.
Also, +1 for manlio's answer; would you believe I actually tried to use exactly that syntax intuitively before I went looking for an alternative? :O
When thinking about this question I start to wondering if std::copy() and/or std::fill are specialized (I really mean optimized) for std::vector<bool>.
Is this required by C++ standard or, perhaps, it is common approach by C++ std library vendors?
Simple speaking, I wonder to know if the following code:
std::vector<bool> v(10, false);
std::fill(v.begin(), v.end(), true);
is in any way better/different than that:
std::vector<bool> v(10, false);
for (auto it = v.begin(); it != v.end(); ++it) *it = true;
To be very strict - can, let say: std::fill<std::vector<bool>::iterator>() go into internal representation of std::vector<bool> and sets their entire bytes instead of single bits? I assume making std::fill friend of std::vector<bool> is not a big problem for library vendor?
[UPDATE]
Next related question: can I (or anybody else :) specialize such algorithms for let say std::vector<bool>, if not already specialized? Is this allowed by C++ standard? I know this will be non portable - but just for one selected std C++ library? Assuming I (or anybody else) find a way to get to std::vector<bool> private parts.
STD is headers only library and it is shipped with your compiler. You can look into those headers yourself. For GCC's vector<bool> impelemtation is in stl_bvector.h. It probably will be the same file for other compilers too. And yes, there is specialized fill (look near __fill_bvector).
Optimizations are nowhere mandated in the standard. It is assumed to be a "quality of implementation" issue if an optimization could applied. The asymptotic complexity of most algorithms is, however, restricted.
Optimizations are allowed as long as a correct program behaves according to what the standard mandates. The examples you ask about, i.e., optimizations involving standard algorithms using iterators on std::vector<bool>, can achieve their objective pretty much in any way the implementation sees fit because there is no way to monitor how they are implemented. This said, I doubt very much that there is any standard library implementation optimizing operations on std::vector<bool>. Most people seem to think that this specialization is an abomination in the first place and that it should go away.
A user is only allowed to create specializations of library types if the specialization involves at least one user defined type. I don't think a user is allowed to provide any function in namespace std at all: There isn't any needs because all such functions would involve a user defined type and would, thus, be found in the user's namespace. Formulated differently: I think you are out of luck with respect to getting algoritms optimized for std::vector<bool> for the time being. You might consider contributing optimized versions to the open source implementations (e.g., libstdc++ and libc++), however.
There is no specialization for it, but you can still use it. (even though it's slow)
But here is a trick I found which enables std::fill on std::vector<bool>, using proxy class std::_Vbase.
(WARNING: I've tested it only for MSVC2013, so it may not work on other compilers.)
int num_bits = 100000;
std::vector<bool> bit_set(num_bits , true);
int bitsize_elem = sizeof(std::_Vbase) * 8; // 1byte = 8bits
int num_elems = static_cast<int>(std::ceil(num_bits / static_cast<double>(bitsize_elem)));
Here, since you need the whole bits of an element if you use any bit of it, the number of elements must be rounded up.
Using this information, we will build a vector of pointers that pointing the original elements underlying the bits.
std::vector<std::_Vbase*> elem_ptrs(num_elems, nullptr);
std::vector<bool>::iterator bitset_iter = bit_set.begin();
for (int i = 0; i < num_elems; ++i)
{
std::_Vbase* elem_ptr = const_cast<std::_Vbase*>((*bitset_iter)._Myptr);
elem_ptrs[i] = elem_ptr;
std::advance(bitset_iter, bitsize_elem);
}
(*bitset_iter)._Myptr : By dereferencing the iterator of std::vector<bool>, you can access the proxy class reference and its member _Myptr.
Since the return type of std::vector<bool>::iterator::operator*() is const std::_Vbase*, remove the constness of it by const_cast.
Now we get the pointer which is pointing the original element underlying those bits, std::_Vbase* elem_ptr.
elem_ptrs[i] = elem_ptr : Record this pointer,...
std::advance(bitset_iter, bitsize_elem) : ...and continue our journey to find the next element, by jumping bits held by the previous element.
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, 0); // fill every bits "false"
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, -1); // fill every bits "true"
Now, we can use std::fill on the vector of pointers, rather than vector of bits.
Perhaps some may feel uncomfortable using the proxy class externally and even remove the constness of it.
But if you don't care about that and want something fast, this is the fastest way.
I did some comparisons below. (made new project, nothing changed config, release, x64)
int it_max = 10; // do it 10 times ...
int num_bits = std::numeric_limits<int>::max(); // 2147483647
std::vector<bool> bit_set(num_bits, true);
for (int it_count = 0; it_count < it_max; ++it_count)
{
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, 0);
} // Elapse Time : 0.397sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
std::fill(bit_set.begin(), bit_set.end(), false);
} // Elapse Time : 18.734sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
for (int i = 0; i < num_bits; ++i)
{
bit_set[i] = false;
}
} // Elapse Time : 21.498sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
bit_set.assign(num_bits, false);
} // Elapse Time : 21.779sec
for (int it_count = 0; it_count < it_max; ++it_count)
{
bit_set.swap(std::vector<bool>(num_bits, false)); // You can not use elem_ptrs anymore
} // Elapse Time : 1.3sec
There is one caveat. When you swap() the original vector with another one, then the vector of pointers becomes useless!
23.2.5 Class vector from the C++ International Standard goes as far as to tell us
To optimize space allocation, a specialization of vector for bool elements is provided:
after which the bitset specialization is provided. That's as far as the standard goes regarding vector<bool>, vendors need to implement it using a bitset to optimize for space. Optimizing for space comes with a cost here, as to not optimize for speed.
It's easier to get a book from the library than it is to find a book if it were between all the library books stapled closely together in containers....
Take your example, you're trying to do a std::fill or std::copy from begin to end. But that's not always the case, sometimes it doen't just simply map to an entire byte. So, that's a bit of a problem in terms of speed optimization. It's easy for the case you'd have to change every bit to one, that's just changing the bytes to 0xF, but that's not the case here; it becomes much harder if you were to only changes certain bits of a byte. Then you'll need to actually compute what the byte will be; that's not a trivial thing to do*, or at least not as an atomic operation on current hardware.
It's the premature optimization story, it's nice in terms of space but horrible in terms of performance.
Is having a "is a multiple of 8 bits" check worth the overhead? I doubt it.
* We're talking about multiple bits here, for the case it's just one bit you can of course do a bit operation.
I was wondering if there's a neater (or better yet, more efficient), method of summing values of a vector/(asymmetric) matrix (a matrix having structure like symmetry, could of course be exploited in looping, but not that pertinent to my question) pointed by a collection of indices. Basically this code could be used to calculate, say, a cost of a route through a 2D matrix. I'm looking for a way to utilize CPU, not GPU.
Here's some relevant code, the one I'm more interested is the first case. I was thinking it's possible to use std::accumulate with a lambda to capture the indices vector, but then I got wondering, if there's already a neater way, perhaps with some other operator. Not a "real problem" as looping is quite clear for my tastes too, but in hunt for the super-neat or more efficient on-liner...
template<typename out_type>
out_type sum(std::vector<float> const& matrix, std::vector<int> const& indices)
{
out_type cost = 0;
for(decltype(indices.size()) i = 0; i < indices.size() - 1; ++i)
{
const int index = indices.size() * indices[i] + indices[i + 1];
cost += matrix[index];
}
const int index = indices.size() * indices[indices.size() - 1] + indices[0];
cost += matrix[index];
return cost;
}
template<typename out_type>
out_type sum(std::vector<std::vector<float>> const& matrix, std::vector<int> const& indices)
{
out_type cost = 0;
for(decltype(indices.size()) i = 0; i < indices.size() - 1; i++)
{
cost += matrix[indices[i]][indices[i + 1]];
}
cost += matrix[indices[indices.size() - 1]][indices[0]];
return cost;
}
Oh, and PPL/TBB are fair game too.
Edit
As an afterthought and as commented to John, would there be a place to employ std::common_type in the calculation as the input and output types may differ? This is a bit of hand-waving and more like learning techniques and libraries. A form of code kata, if you will.
Edit 2
Now, there's one option to make the loops faster, explained in blog writing How to process a STL vector using SSE code by a blogger theowl84. The code uses __m128 directly, but I wonder if there's something in DirectXMath library too.
Edit 3
Now, after writing some concrete code, I found std::accumulate wouldn't get me far. Or at least I couldn't find a way to do the [indices[i + 1] part in matrix[indices[i]][indices[i + 1]]; in a neat way, as std::accumulate itself gives access to only the current value and the sum. In that light, it looks like novelocrat's approach would be the most fruitful one.
DeadMG proposed using parallel_reduce with associativity caveats, further commented by novelocrat. I didn't go about seeing if I could use parallel_reduce, as the interface looked somewhat cumbersome for quick trying. Other than that, even though my code executes serially, it would suffer from the same floating some issues as the parallel reduction version. Though the parallel version would/could be (much) more unpredictable with than serial version, I think.
This goes somewhat tangential, but it may be of interest to some stumbling here, and to those of whom have read this far, may be (very) interested on article Wandering Precision in The NAG blog, which details some intricanciens even introduced by hardware instruction re-ordering! Then there are some ruminations about this very issue in distributed setting in #AltDevBlogADay Synchronous RTS Engines and a Tale of Desyncs. Also, ACCU (the general mailing list is excellent, by the way, and it's free to join) features several articles (e.g. this) on floating point accuracy. A tangential to tangential, I found Fernando Cacciola's Robustness issues in geometric computing to be a good article to read, originally from ACCU mailing list.
And then then the std::common_type. I couldn't find usage for that. If I had two different types as parameters, then the return value could/should be decided by std::common_type. Perhaps more pertinent is std::is_convertible with static_assert to make sure the desired result type is convertible from the argument types (with a clean error message). Other than that, I can only make up a check that the return value/intermediate calculation value accurracy is sufficient to represent the result of summation without overflows and things like that, but I haven't come across a standard facility for that.
That about that, I think, ladies and gentlemen. I enjoyed myself, I hope those reading this got something out of this too.
You could produce an iterator that takes matrix and indices and yields the appropriate values.
class route_iterator
{
vector<vector<float>> const& matrix;
vector<int> const& indices;
int i;
public:
route_iterator(vector<vector<float>> const& matrix_, vector<int> const& indices_,
int begin = 0)
: matrix(matrix_), indices(indices_), i(begin)
{ }
float operator*() {
return matrix[indices[i]][indices[(i + 1) % indices.size()]];
}
route_iterator& operator++() {
++i;
return *this;
}
};
Then your accumulate runs from route_iterator(matrix, indices) to route_iterator(matrix, indices, indices.size()).
Admittedly, though, this sequentializes without a smart compiler turning it into something parallel. What you really want are parallel map and fold (accumulate) operations.
out_type cost = 0;
for(decltype(indices.size()) i = 0; i < indices.size() - 1; i++)
{
cost += matrix[indices[i]][indices[i + 1]];
}
This is basically std::accumulate. PPL provides (and so does TBB, if I recall) parallel_reduce. This requires associativity but not commutivity, and + over the real/float/integer is associative.