Here's simple code:
import std.algorithm;
import std.array;
import std.file;
void main(string[] args)
{
auto t = args[1].readText()
.splitter('\n')
.split("---")
;
}
Looks like it should work, but it won't compile. DMD 2.068.2 fails with this error:
Error: template std.algorithm.iteration.splitter cannot deduce function from
argument types !()(Result, string), candidates are:
...
Error: template instance std.array.split!(Result, string) error instantiating
It compiles if I insert .array before .split.
Am I missing something? Or is it a bug? I've tried to make a brief search in the bug tracker, but didn't found anything.
Bottom line: problems like this can often be fixed by sticking a .array call right before the offending function. This provides it a buffer with enough functionality to run the algorithm.
What follows is the reasoning behind the library and a couple other ideas you can use to implement this too:
The reason this doesn't compile has to do with the philosophy behind std.algorithm and ranges: that they are as cheap as possible to push cost decisions to the top level.
In std.algorithm (and most well-written ranges and range-consuming algorithms), template constraints will reject any input that doesn't offer what it needs for free. Similarly, transforming ranges, like filter, splitter, etc., will return only those capabilities they can offer at minimal cost.
By rejecting them at compile time, they force the programmer to make the decision at the highest level as to how they want to pay those costs. You might rewrite the function to work differently, you might buffer it yourself with a variety of techniques to pay the costs up front, or whatever else you can find that works.
So here's what happens with your code: readText returns an array, which is a nearly full-featured range. (Since it returns a string, made of UTF-8, it doesn't actually offer random access as far as Phobos is concerned (though, confusing, the language itself sees it differently, search the D forums for the "autodecode" controversy if you want to learn more) since finding a Unicode code point in a list of variable-length utf-8 characters requires scanning it all. Scanning it all is not minimal cost, so Phobos will never attempt it unless you specifically ask for it.)
Anyway though, readText returns a range with plenty of features, including savability which splitter needs. Why does splitter need saving? Consider the result it promises: a range of strings starting at the last split point and continuing to the next split point. What does the implementation look like when writing this for a most generic range it can possibly do for cheap?
Something along these lines: first, save your starting position so you can return it later. Then, using popFront, advance through it until you find the split point. When it does, return the saved range up to the point of the split point. Then, popFront past the split point and repeat the process until you've consumed the whole thing (while(!input.empty)).
So, since splitter's implementation required the ability to save the starting point, it requires at least a forward range (which is just a savable range. Andrei now feels naming things like this is a bit silly because there's so many names, but at the time he was writing std.algorithm he still believed in giving them all names).
Not all ranges are forward ranges! Arrays are, saving them is as easy as returning a slice from the current position. Many numerical algorithms are too, saving them just means keeping a copy of the current state. Most transformation ranges are savable if the range they are transforming are savable - again, all they need to do is return the current state.
......as I write this, actually, I think your example should be savable. And, indeed, there is an overload that takes a predicate and compiles!
http://dlang.org/phobos/std_algorithm_iteration.html#.splitter.3
import std.algorithm;
import std.array;
import std.stdio;
void main(string[] args)
{
auto t = "foo\n---\nbar"
.splitter('\n')
.filter!(e => e.length)
.splitter!(a => a == "---")
;
writeln(t);
}
Output: [["foo"], ["bar"]]
Yea, it compiled and split on lines equal to a particular thing. The other overload, .splitter("---"), fails to compile, because that overload requires slice functionality (or a narrow string, which Phobos refuses to slice generically... but knows it actually can be anyway, so the function is special-cased. You see that all over the library.)
But, why does it require slicing instead of just saving? Honestly, I don't know. Maybe I'm missing something too, but the existence of the overload that does work implies to me that my conception of the algorithm is correct; it can be done this way. I do believe slicing is a bit cheaper, but the save version is cheap enough too (you'd keep a count of how many items you popped past to get to the splitter, then return saved.take(that_count).... maybe that's the reason right there: you would iterate over the items twice, once inside the algorithm, then again outside, and the library considers that sufficiently costly to punt up a level. (The predicate version sidesteps this by making your function do the scanning, and thus Phobos considers it not its problem anymore, you are aware of what your own function is doing.)
I can see the logic in that. I could go both ways on it though, cuz the decision to actually run over it again is still on the outside, but I don understand why that might not be desirable to do without some thought.
Finally, why doesn't splitter offer indexing or slicing on its output? Why doesn't filter offer it either? Why DOES map offer it?
Well, it has to do with that low cost philosophy again. map can offer it (assuming its input does) because map doesn't actually change the number of elements: the first element in the output is also the first element in the input, just with some function run on the result. Ditto for the last, and all others in between.
filter changes that though. Filtering out the odd numbers of [1,2,3] yields just [2]: the length is different and 2 is now found at the beginning instead of the middle. But, you can't know where it is until you actually apply the filter - you can't jump around without buffering the result.
splitter is similar to filter. It changes the placement of elements, and the algorithm doesn't know where it splits until it actually runs through the elements. So it can tell as you iterate, but not ahead of iteration, so indexing would be O(n) speed - computationally too expensive. Indexing is supposed to be extremely cheap.
Anyway, now that we understand why the principle is there - to let you, the end programmer make decisions about costly things like buffering (which requires more memory than is free) or additional iteration (which requires more CPU time than is cost-free to the algorithm), and have some idea as to why splitter needs it by thinking about its implementation, we can look at ways to satisfy the algorithm: we need to either use the version that eats a few more CPU cycles and write it with our custom compare function (see sample above), or provide slicing somehow. The most straightforward way is by buffering the result in an array.
import std.algorithm;
import std.array;
import std.file;
void main(string[] args)
{
auto t = args[1].readText()
.splitter('\n')
.array // add an explicit buffering call, understanding this will cost us some memory and cpu time
.split("---")
;
}
You might also buffer it locally or something yourself to reduce the cost of the allocation, but however you do it, the cost has to be paid somewhere and Phobos prefers you the programmer, who understands the needs of your program and if you are willing to pay these costs or not, to make that decision instead of it paying it on your behalf without telling you.
Related
I'm trying to create a general memoizator for multiple and arbitrary functions.
For each function std::function<ReturnType(Args...)> that we want to memoize, we unordered_map<Args ..., ReturnType> (I'm keeping things simple on purpose).
The big problem comes when our memoized function has some really big argument Args ...: for example let suppose that our function sort a vector of 10 millions numbers and then returns the sorted vector, so something like std::function<vector<double>(vector<double>)>.
As you can imagine, after having inserted less than 100 vectors, we have already filled 8 GBS of memory. Notice that maybe this is given from the combination of huge vectors and the memory required by the sorting algorithm (I didn't investigate on the causes).
So what about if instead of the structure described above, we define unordered_map<UUID(Args ...), ReturnType> (where UUID= Universally Unique Identifier)? We should relax the deterministic feature (so maybe we return a wrong error), but with a very low probability.
The problem is that since I never used UUIDs, I don't know if there are suitable implementations for this application.
So my question is:
There exists a better solution than UUIDs for this problem?
Which UUID implementation is better suitable for this problem?
boost uuid is a possible candidate?
Unfortunately, the problem could be solved for Args ... but not for ReturnType, so there is a solution for memoized result?
Notice that the UUIDs generated for the object x should be the same even in different runs and machines.
Notice that if we have the same UUID for two different objects (and so we return the wrong value) with a really low probability, then it could be acceptable...let's say that this could be a "probabilistic memoizator".
I know that this application doesn't make sense in a memoization context (what are the odds that an user asks two times to sort the same 10 millions elements vector?), but it's time and memory expensive (so good for benchmarking and to introduce the memory problem that I stated above), so please don't whip and crucify me because this is an absurd memoization application.
Identifying any object is easy. The address is "object identity" in C++. This is also the reason that even empty classes cannot have zero size.
Now, what you want is value equivalence. That's strictly not in the language domain. It's solidly in the application/library logic domain.
You should consider using something like boost::flyweights. It has precisely this facility, and makes it "easy" to customize the equivalence semantics for your types.
I am trying to construct a set in the following manner:
std::set<SomeType> mySet(aVector.begin(), aVector.end());
The performance of this line is very efficient in most cases. 10% of the time, I run into cases where this takes too long to run (over 600 milliseconds in some cases!). Why could that be happening? The inputs are very similar each time (the vector is for the most part sorted). Any ideas?
I see three likely possibilities:
operator< for your structs isn't implementing a strict weak ordering, which is required for std::set to work correctly. Keep in mind if your double values are ever NaN, you are breaking this assumption (on one of the sets that took a long time look to see if there are NaNs).
Occasionally your data isn't very sorted. Try always doing a std::sort on the vector first and see if the performance flattens out -- default construct the set then use the std::set::insert that takes two parameters, the first being a hint for what element to compare against first (if you can provide a good hint). That will let you build the set without resorting. If that fixes the spikes you know the initial sortedness of the data is the cause.
Your heap allocator occasionally does an operation that makes it take much longer than normal. It may be splitting or joining blocks to find free memory on the particular std::set() calls that are taking longer. You can try using an alternative allocator (if your program is multithreaded you might try Google's tcmalloc). You can rule this out if you have a profiler that shows time spent in the allocator, but most lack this feature. Another alternative would be to use a boost::intrusive_set, which will prevent the need for allocation when storing the items in the set.
I'm developing game. I store my game-objects in this map:
std::map<std::string, Object*> mObjects;
std::string is a key/name of object to find further in code. It's very easy to point some objects, like: mObjects["Player"] = .... But I'm afraid it's to slow due to allocation of std::string in each searching in that map. So I decided to use int as key in that map.
The first question: is that really would be faster?
And the second, I don't want to remove my current type of objects accesing, so I found the way: store crc string calculating as key. For example:
Object *getObject(std::string &key)
{
int checksum = GetCrc32(key);
return mObjects[checksum];
}
Object *temp = getOject("Player");
Or this is bad idea? For calculating crc I would use boost::crc. Or this is bad idea and calculating of checksum is much slower than searching in map with key type std::string?
Calculating a CRC is sure to be slower than any single comparison of strings, but you can expect to do about log2N comparisons before finding the key (e.g. 10 comparisons for 1000 keys), so it depends on the size of your map. CRC can also result in collisions, so it's error prone (you could detect collisions relatively easily detect, and possibly even handle them to get correct results anyway, but you'd have to be very careful to get it right).
You could try an unordered_map<> (possibly called hash_map) if your C++ environment provides one - it may or may not be faster but won't be sorted if you iterate. Hash maps are yet another compromise:
the time to hash is probably similar to the time for your CRC, but
afterwards they can often seek directly to the value instead of having to do the binary-tree walk in a normal map
they prebundle a bit of logic to handle collisions.
(Silly point, but if you can continue to use ints and they can be contiguous, then do remember that you can replace the lookup with an array which is much faster. If the integers aren't actually contiguous, but aren't particularly sparse, you could use a sparse index e.g. array of 10000 short ints that are indices into 1000 packed records).
Bottom line is if you care enough to ask, you should implement these alternatives and benchmark them to see which really works best with your particular application, and if they really make any tangible difference. Any of them can be best in certain circumstances, and if you don't care enough to seriously compare them then it clearly means any of them will do.
For the actual performance you need to profile the code and see it. But I would be tempted to use hash_map. Although its not part of the C++ standard library most of the popular implentations provide it. It provides very fast lookup.
The first question: is that really would be faster?
yes - you're comparing an int several times, vs comparing a potentially large map of strings of arbitrary length several times.
checksum: Or this is bad idea?
it's definitely not guaranteed to be unique. it's a bug waiting to bite.
what i'd do:
use multiple collections and embrace type safety:
// perhaps this simplifies things enough that t_player_id can be an int?
std::map<t_player_id, t_player> d_players;
std::map<t_ghoul_id, t_ghoul> d_ghouls;
std::map<t_carrot_id, t_carrot> d_carrots;
faster searches, more type safety. smaller collections. smaller allocations/resizes.... and on and on... if your app is very trivial, then this won't matter. use this approach going forward, and adjust after profiling/as needed for existing programs.
good luck
If you really want to know you have to profile your code and see how long does the function getObject take. Personally I use valgrind and KCachegrind to profile and render data on UNIX system.
I think using id would be faster. It's faster to compare int than string so...
Which is better (or faster), a C++ for loop or the foreach operator provided by Qt? For example, the following condition
QList<QString> listofstrings;
Which is better?
foreach(QString str, listofstrings)
{
//code
}
or
int count = listofstrings.count();
QString str = QString();
for(int i=0;i<count;i++)
{
str = listofstrings.at(i);
//Code
}
It really doesn't matter in most cases.
The large number of questions on StackOverflow regarding whether this method or that method is faster, belie the fact that, in the vast majority of cases, code spends most of its time sitting around waiting for users to do something.
If you are really concerned, profile it for yourself and act on what you find.
But I think you'll most likely find that only in the most intense data-processing-heavy work does this question matter. The difference may well be only a couple of seconds and even then, only when processing huge numbers of elements.
Get your code working first. Then get it working fast (and only if you find an actual performance issue).
Time spent optimising before you've finished the functionality and can properly profile, is mostly wasted time.
First off, I'd just like to say I agree with Pax, and that the speed probably doesn't enter into it. foreach wins hands down based on readability, and that's enough in 98% of cases.
But of course the Qt guys have looked into it and actually done some profiling:
http://blog.qt.io/blog/2009/01/23/iterating-efficiently/
The main lesson to take away from that is: use const references in read only loops as it avoids the creation of temporary instances. It also make the purpose of the loop more explicit, regardless of the looping method you use.
It really doesn't matter. Odds are if your program is slow, this isn't the problem. However, it should be noted that you aren't make a completely equal comparison. Qt's foreach is more similar to this (this example will use QList<QString>):
for(QList<QString>::iterator it = Con.begin(); it != Con.end(); ++it) {
QString &str = *it;
// your code here
}
The macro is able to do this by using some compiler extensions (like GCC's __typeof__) to get the type of the container passed. Also imagine that boost's BOOST_FOREACH is very similar in concept.
The reason why your example isn't fair is that your non-Qt version is adding extra work.
You are indexing instead of really iterating. If you are using a type with non-contiguous allocation (I suspect this might be the case with QList<>), then indexing will be more expensive since the code has to calculate "where" the n-th item is.
That being said. It still doesn't matter. The timing difference between those two pieces of code will be negligible if existent at all. Don't waste your time worrying about it. Write whichever you find more clear and understandable.
EDIT: As a bonus, currently I strongly favor the C++11 version of container iteration, it is clean, concise and simple:
for(QString &s : Con) {
// you code here
}
Since Qt 5.7 the foreach macro is deprecated, Qt encourages you to use the C++11 for instead.
http://doc.qt.io/qt-5/qtglobal.html#foreach
(more details about the difference here : https://www.kdab.com/goodbye-q_foreach/)
I don't want to answer the question which is faster, but I do want to say which is better.
The biggest problem with Qt's foreach is the fact that it takes a copy of your container before iterating over it. You could say 'this doesn't matter because Qt classes are refcounted' but because a copy is used you don't actually change your original container at all.
In summary, Qt's foreach can only be used for read-only loops and thus should be avoided. Qt will happily let you write a foreach loop which you think will update/modify your container but in the end all changes are thrown away.
First, I completely agree with the answer that "it doesn't matter". Pick the cleanest solution, and optimize if it becomes a problem.
But another way to look at it is that often, the fastest solution is the one that describes your intent most accurately. In this case, QT's foreach says that you'd like to apply some action for each element in the container.
A plain for loop say that you'd like a counter i. You want to repeatedly add one to this value i, and as long as it is less than the number of elements in the container, you would like to perform some action.
In other words, the plain for loop overspecifies the problem. It adds a lot of requirements that aren't actually part of what you're trying to do. You don't care about the loop counter. But as soon as you write a for loop, it has to be there.
On the other hand, the QT people have made no additional promises that may affect performance. They simply guarantee to iterate through the container and apply an action to each.
In other words, often the cleanest and most elegant solution is also the fastest.
The foreach from Qt has a clearer syntax for the for loop IMHO, so it's better in that sense. Performance wise I doubt there's anything in it.
You could consider using the BOOST_FOREACH instead, as it is a well thought out fancy for loop, and it's portable (and presumably will make it's way into C++ some day and is future proof too).
A benchmark, and its results, on this can be found at http://richelbilderbeek.nl/CppExerciseAddOneAnswer.htm
IMHO (and many others here) it (that is speed) does not matter.
But feel free to draw your own conclusions.
For small collections, it should matter and foreach tends to be clearer.
However, for larger collections, for will begin to beat foreach at some point. (assuming that the 'at()' operator is efficient.
If this is really important (and I'm assuming it is since you are asking) then the best thing to do is measure it. A profiler should do the trick, or you could build a test version with some instrumentation.
You might look at the STL's for_each function. I don't know whether it will be faster than the two options you present, but it is more standardized than the Qt foreach and avoids some of the problems that you may run into with a regular for loop (namely out of bounds indexing and difficulties with translating the loop to a different data structure).
I would expect foreach to work nominally faster in some cases, and the about same in others, except in cases where the items are an actual array in which case the performace difference is negligible.
If it is implemented on top of an enumerator, it may be more efficient than a straight indexing, depending on implementation. It's unlikely to be less efficient. For example, if someone exposed a balanced tree as both indexable and enumerable, then foreach will be decently faster. This is because each index will have to independently find the referenced item, while an enumerator has the context of the current node to more efficiently navigate to the next ont.
If you have an actual array, then it depends on the implementation of the language and class whether foreach will be faster for the same as for.
If indexing is a literal memory offset(such as C++), then for should be marginally faster since you're avoiding a function call. If indexing is an indirection just like a call, then it should be the same.
All that being said... I find it hard to find a case for generalization here. This is the last sort of optimization you should be looking for, even if there is a performance problem in your application. If you have a performance problem that can be solved by changing how you iterate, you don't really have a performance problem. You have a BUG, because someone wrote either a really crappy iterator, or a really crappy indexer.
I seem to be seeing more 'for' loops over iterators in questions & answers here than I do for_each(), transform(), and the like. Scott Meyers suggests that stl algorithms are preferred, or at least he did in 2001. Of course, using them often means moving the loop body into a function or function object. Some may feel this is an unacceptable complication, while others may feel it better breaks down the problem.
So... should STL algorithms be preferred over hand-rolled loops?
It depends on:
Whether high-performance is required
The readability of the loop
Whether the algorithm is complex
If the loop isn't the bottleneck, and the algorithm is simple (like for_each), then for the current C++ standard, I'd prefer a hand-rolled loop for readability. (Locality of logic is key.)
However, now that C++0x/C++11 is supported by some major compilers, I'd say use STL algorithms because they now allow lambda expressions — and thus the locality of the logic.
I’m going to go against the grain here and advocate that using STL algorithms with functors makes code much easier to understand and maintain, but you have to do it right. You have to pay more attention to readability and clearity. Particularly, you have to get the naming right. But when you do, you can end up with cleaner, clearer code, and paradigm shift into more powerful coding techniques.
Let’s take an example. Here we have a group of children, and we want to set their “Foo Count” to some value. The standard for-loop, iterator approach is:
for (vector<Child>::iterator iter = children.begin();
iter != children.end();
++iter)
{
iter->setFooCount(n);
}
Which, yeah, it’s pretty clear, and definitely not bad code. You can figure it out with just a little bit of looking at it. But look at what we can do with an appropriate functor:
for_each(children.begin(), children.end(), SetFooCount(n));
Wow, that says exactly what we need. You don’t have to figure it out; you immediately know that it’s setting the “Foo Count” of every child. (It would be even clearer if we didn’t need the .begin() / .end() nonsense, but you can’t have everything, and they didn’t consult me when making the STL.)
Granted, you do need to define this magical functor, SetFooCount, but its definition is pretty boilerplate:
class SetFooCount
{
public:
SetFooCount(int n) : fooCount(n) {}
void operator () (Child& child)
{
child.setFooCount(fooCount);
}
private:
int fooCount;
};
In total it’s more code, and you have to look at another place to find out exactly what SetFooCount is doing. But because we named it well, 99% of the time we don’t have to look at the code for SetFooCount. We assume it does what it says, and we only have to look at the for_each line.
What I really like is that using the algorithms leads to a paradigm shift. Instead of thinking of a list as a collection of objects, and doing things to every element of the list, you think of the list as a first class entity, and you operate directly on the list itself. The for-loop iterates through the list, calling a member function on each element to set the Foo Count. Instead, I am doing one command, which sets the Foo Count of every element in the list. It’s subtle, but when you look at the forest instead of the trees, you gain more power.
So with a little thought and careful naming, we can use the STL algorithms to make cleaner, clearer code, and start thinking on a less granular level.
The std::foreach is the kind of code that made me curse the STL, years ago.
I cannot say if it's better, but I like more to have the code of my loop under the loop preamble. For me, it is a strong requirement. And the std::foreach construct won't allow me that (strangely enough, the foreach versions of Java or C# are cool, as far as I am concerned... So I guess it confirms that for me the locality of the loop body is very very important).
So I'll use the foreach only if there is only already a readable/understandable algorithm usable with it. If not, no, I won't. But this is a matter of taste, I guess, as I should perhaps try harder to understand and learn to parse all this thing...
Note that the people at boost apparently felt somewhat the same way, for they wrote BOOST_FOREACH:
#include <string>
#include <iostream>
#include <boost/foreach.hpp>
int main()
{
std::string hello( "Hello, world!" );
BOOST_FOREACH( char ch, hello )
{
std::cout << ch;
}
return 0;
}
See : http://www.boost.org/doc/libs/1_35_0/doc/html/foreach.html
That's really the one thing that Scott Meyers got wrong.
If there is an actual algorithm that matches what you need to do, then of course use the algorithm.
But if all you need to do is loop through a collection and do something to each item, just do the normal loop instead of trying to separate code out into a different functor, that just ends up dicing code up into bits without any real gain.
There are some other options like boost::bind or boost::lambda, but those are really complex template metaprogramming things, they do not work very well with debugging and stepping through the code so they should generally be avoided.
As others have mentioned, this will all change when lambda expressions become a first class citizen.
The for loop is imperative, the algorithms are declarative. When you write std::max_element, it’s obvious what you need, when you use a loop to achieve the same, it’s not necessarily so.
Algorithms also can have a slight performance edge. For example, when traversing an std::deque, a specialized algorithm can avoid checking redundantly whether a given increment moves the pointer over a chunk boundary.
However, complicated functor expressions quickly render algorithm invocations unreadable. If an explicit loop is more readable, use it. If an algorithm call can be expressed without ten-storey bind expressions, by all means prefer it. Readability is more important than performance here, because this kind of optimization is what Knuth so famously attributes to Hoare; you’ll be able to use another construct without trouble once you realize it’s a bottleneck.
It depends, if the algorithm doesn't take a functor, then always use the std algorithm version. It's both simpler for you to write and clearer.
For algorithms that take functors, generally no, until C++0x lambdas can be used. If the functor is small and the algorithm is complex (most aren't) then it may be better to still use the std algorithm.
I'm a big fan of the STL algorithms in principal but in practice it's just way too cumbersome. By the time you define your functor/predicate classes a two line for loop can turn into 40+ lines of code that is suddenly 10x harder to figure out.
Thankfully, things are going to get a ton easier in C++0x with lambda functions, auto and new for syntax. Checkout this C++0x Overview on Wikipedia.
I wouldn't use a hard and fast rule for it. There are many factors to consider, like often you perform that certain operation in your code, is just a loop or an "actual" algorithm, does the algorithm depend on a lot of context that you would have to transmit to your function?
For example I wouldn't put something like
for (int i = 0; i < some_vector.size(); i++)
if (some_vector[i] == NULL) some_other_vector[i]++;
into an algorithm because it would result in a lot more code percentage wise and I would have to deal with getting some_other_vector known to the algorithm somehow.
There are a lot of other examples where using STL algorithms makes a lot of sense, but you need to decide on a case by case basis.
I think the STL algorithm interface is sub-optimal and should be avoided because using the STL toolkit directly (for algorithms) might give a very small gain in performance, but will definitely cost readability, maintainability, and even a bit of writeability when you're learning how to use the tools.
How much more efficient is a standard for loop over a vector:
int weighted_sum = 0;
for (int i = 0; i < a_vector.size(); ++i) {
weighted_sum += (i + 1) * a_vector[i]; // Just writing something a little nontrivial.
}
than using a for_each construction, or trying to fit this into a call to accumulate?
You could argue that the iteration process is less efficient, but a for _ each also introduces a function call at each step (which might be mitigated by trying to inline the function, but remember that "inline" is only a suggestion to the compiler - it may ignore it).
In any case, the difference is small. In my experience, over 90% of the code you write is not performance critical, but is coder-time critical. By keeping your STL loop all literally inline, it is very readable. There is less indirection to trip over, for yourself or future maintainers. If it's in your style guide, then you're saving some learning time for your coders (admit it, learning to properly use the STL the first time involves a few gotcha moments). This last bit is what I mean by a cost in writeability.
Of course there are some special cases -- for example, you might actually want that for_each function separated to re-use in several other places. Or, it might be one of those few highly performance-critical sections. But these are special cases -- exceptions rather than the rule.
IMO, a lot of standard library algorithms like std::for_each should be avoided - mainly for the lack-of-lambda issues mentioned by others, but also because there's such a thing as inappropriate hiding of details.
Of course hiding details away in functions and classes is all part of abstraction, and in general a library abstraction is better than reinventing the wheel. But a key skill with abstraction is knowing when to do it - and when not to do it. Excessive abstraction can damage readability, maintainability etc. Good judgement comes with experience, not from inflexible rules - though you must learn the rules before you learn to break them, of course.
OTOH, it's worth considering the fact that a lot of programmers have been using C++ (and before that, C, Pascal etc) for a long time. Old habits die hard, and there is this thing called cognitive dissonance which often leads to excuses and rationalisations. Don't jump to conclusions, though - it's at least as likely that the standards guys are guilty of post-decisional dissonance.
I think a big factor is the developer's comfort level.
It's probably true that using transform or for_each is the right thing to do, but it's not any more efficient, and handwritten loops aren't inherently dangerous. If it would take half an hour for a developer to write a simple loop, versus half a day to get the syntax for transform or for_each right, and move the provided code into a function or function object. And then other developers would need to know what was going on.
A new developer would probably be best served by learning to use transform and for_each rather than handmade loops, since he would be able to use them consistently without error. For the rest of us for whom writing loops is second nature, it's probably best to stick with what we know, and get more familiar with the algorithms in our spare time.
Put it this way -- if I told my boss I had spent the day converting handmade loops into for_each and transform calls, I doubt he'd be very pleased.