using File().byLine() with fold() - d

I am trying to use the fold operation on a range returned by byLine(). I want the lambda which is passed to fold to be a multi-line function. I have searched google and read the documentation, but cannot find a description as to what the signature of the function should be. I surmize that one of the pair is the accumulated sum and one is the current element. This is what I have but it will not build
auto sum = File( fileName, "r" )
.byLine
.fold!( (a, b)
{
auto len = b.length;
return a + len;
});
The error I get back from dmd is:
main.d(22): Error: no property `fold` for `(File(null, null)).this(fileName, "r").byLine(Flag.no, '\n')` of type `std.stdio.File.ByLineImpl!(char, char)`
So my question is two fold:
Is my use of fold in this case valid?
How do I pass a curley brace lambda to fold?
I have tried searching google and reading the dlang documentation for fold. All documentation uses the shortcut lambda syntax (a, b) => a + b.

So the way fold works is that it accepts a list of function aliases on how to fold the next element in. if you don't provide it with a starting value, it uses the first element as the starting value. Quoting the documentation (emphasis mine):
The call fold!(fun)(range, seed) first assigns seed to an internal
variable result, also called the accumulator. Then, for each element
x in range, result = fun(result, x) gets evaluated. Finally, result
is returned. The one-argument version fold!(fun)(range) works
similarly, but it uses the first element of the range as the seed
(the range must be non-empty).
The reason why your original code didn't work is because you can't add an integer to a string (the seed was the first line of the file).
The reason why your latest version works (although only on 32-bit machines, since you can't reassign a size_t to an int on 64-bit machines) is because you gave it a starting value of 0 to seed the fold. So that is the correct mechanism to use for your use case.
The documentation is a bit odd, because the function is actually not an eponymous template, so it has two parts to the documentation -- one for the template, and one for the fold function. The fold function doc lists the runtime parameters that are accepted by fold, in this case, the input range and the seed. The documentation link for it is here: https://dlang.org/phobos/std_algorithm_iteration.html#.fold.fold

I was able to tweak the answer provied by Akshay. The following compiled and ran:
module example;
import std.stdio;
import std.algorithm.iteration : fold;
void main() {
string fileName = "test1.txt";
auto sum = File(fileName, "r")
.byLine
.fold!( (a, b) {
// You can define the lambda function using the `{}` syntax
auto len = b.length;
return a + len;
})(0); // Initialize the fold with a value of 0
}

Related

Storing 2 variables at once from a tuple function

I have a tuple function that returns a tuple of the form
<node*,int>
Is there a way to store 2 values at once without creating another tuple. I know we can do
n,score=tuplefunct(abc);
in python. But if I want to store both return values in c++ without making another tuple i need to call twice
n=get<0>(tuplefunct(abc);
score=get<1>(tuplefunct(abc));
is there any alternative to this in c++ to store the values at once.
You dont need to call the function twice (note that there is no "another tuple" involved, the function returns one and thats what you use):
auto x = tuplefunct(abc);
auto n = get<0>(x);
auto score = get<1>(x);
If you have C++17 available you can use structured bindings
auto [n,score] = tuplefunct(abc);
Or to get close to that without C++17, you can use std::tie (from C++11 on):
node* n;
int score;
std::tie(n,score) = tuplefunct(abc);

How to search by member accessor value with std::find_if()?

I am learning C++ at the moment and have an example program implemented with an array of objects data store. To make some other operations easier, I have changed the store to a vector. With this change I am now not sure of the best way to search the store to find an object based on a member accessor value.
Initially I used a simple loop:
vector<Composer> composers; // where Composer has a member function get_last_name() that returns a string
Composer& Database::get_composer(string last_name)
{
for (Composer& c : composers)
if (c.get_last_name().compare(last_name))
return c;
throw std::out_of_range("Composer not found");
}
This works just fine of course, but to experiment I wanted to see if there were vector specific functions that could also do the job. So far I have settled on trying to use find_if() (if there is a better function, please suggest).
However, I am not sure exactly the correct way to use find_if(). Based on code seen in online research I have replaced the above with the following:
vector<Composer> composers; // where Composer has a member function get_last_name() that returns a string
Composer& Database::get_composer(string last_name)
{
auto found = find_if(composers.begin(), composers.end(),
[last_name](Composer& c) -> bool {c.get_last_name().compare(last_name);});
if (found == composers.end())
throw out_of_range("Composer not found");
else
return *found;
}
This does not work. It does find a result, but it is the incorrect one. If an argument that matches, say the third composer's last name the function always returns the first item from the vector (if I pass an argument that doesn't match any last name the function correctly throws an exception)... what am I doing wrong?
You are on the right track, your lambda needs return statement. Also in such case you do not have to specify it's return type explicitly, it can be deduced:
find_if(composers.begin(), composers.end(),
[last_name](const Composer& c) { return c.get_last_name() == last_name);});
you original code should not compile or at least emit warning(s), you should pay attention to them.
Note: it is not clear how your original code worked if you tested it, it should be:
if (c.get_last_name().compare(last_name) == 0 )
or simply:
if (c.get_last_name() == last_name )
as std::string::compare() returns int -1 0 or 1, so your code searches for string that does not match variable last_name
With range-v3, you may use projection:
auto it = ranges::find(composers, last_name, &composers::get_last_name);

Dividing each element in a container between a given number C++

I was multiplying each container against another number so I did the following:
local_it begin = magnitudesBegin;
std::advance(begin , 2);
local_it end = magnitudesBegin;
std::advance(end, 14);
std::transform(begin, end, firstHalf.begin(),
std::bind1st(std::multiplies<double>(),100));
It worked wonders, problem is when doing the same to divide between another container. Here is a working example of my problem:
const std::size_t stabilitySize = 13;
boost::array<double,stabilitySize> secondHalf;
double fundamental = 707;
boost::array<double, stabilitySize> indexes = {{3,4,5,6,7,8,9,10,11,12,13,14,15}};
std::transform(indexes.begin(), indexes.end(), secondHalf.begin(),
std::bind1st(std::divides<double>(),fundamental));
It does divide the container. But instead of dividing each element in the array against 707 it divides 707 between each element in the array.
std::bind1st(std::divides<double>(),fundamental)
The code above takes a functor std::divides<double> that takes two arguments and fixes the value of the first argument to be fundamental. That is it fixes the numerator of the operation and you get the expected result. If you want to bind fundamental to be the denominator, use std::bind2nd.
you can try the following , divide has a completely different operation than multiply, it just divides a constant number by all your elements
std::bind1st(std::multiplies<double>(),1.0/707.0));
If the number 707.0 is something like a fundamental constant, and a division can be seen as a "conversion", let's call it "x to y" (I don't know what your numbers are representing, so replace this by meaningful words). It would be nice to wrap this "x to y" conversion in a free-standing function for re-usability. Then, use this function on std::transform.
double x_to_y(double x) {
return x / 707.0;
}
...
std::transform(..., x_to_y);
If you had C++11 available, or want to use another lambda-library, another option is to write this in-line where being used. You might find this syntax more readable like parameter binding using bind2nd:
std::transform(..., _1 / 707.0); // when using boost::lambda

for_each and transform

for_each(ivec.begin(),ivec.end(),
[]( int& a)->void{ a = a < 0 ? -a : a;
});
transform(ivec.begin(),ivec.end(),ivec.begin(),
[](int a){return a < 0 ? -a : a;
});
I am currently learning lambdas and I am curious how the two implementations, that I have posted above, differ?
The two implementations you show do not differ logically (assuming you get the first version right by adding a return). The first one modifies elements in place while the last one overwrites its elements with new values.
The biggest difference I see, with transform you just can pass abs instead of a lambda that reimplements it.
transform is what would, in a functional language, be called map. That is, it applies a function to every element in the input range, and stores the output into an output range. (So it is generally intended to not modify the inputs, and instead store a range of outputs)
for_each simply discards the return value from the applied function (so it might modify the inputs).
That's the main difference. They are similar, but designed for different purposes.
This first version:
for_each(ivec.begin(),ivec.end(),
[]( int& a)->void{ a = a < 0 ? -a : a;
});
works by calling the lambda function
[]( int& a)->void{ a = a < 0 ? -a : a; }
once for every element in the range, passing in the elements in the range as arguments. Accordingly, it updates the elements in-place by directly changing their values.
This second version:
transform(ivec.begin(),ivec.end(),ivec.begin(),
[](int a){return a < 0 ? -a : a;
});
works by applying the lambda function
[](int a){return a < 0 ? -a : a;}
to each of the elements in the range ivec.begin() to ivec.end(), generating a series of values, and then writing those values back to the range starting at ivec.begin(). This means that it overwrites the original contents of the range with the range of values produced by applying the function to each array element, so the elements are overwritten rather than modified in-place. The net effect is the same as the original for_each, though.
Hope this helps!

How do you use ranges in D?

Whenever I try to use ranges in D, I fail miserably.
What is the proper way to use ranges in D? (See inline comments for my confusion.)
void print(R)(/* ref? auto ref? neither? */ R r)
{
foreach (x; r)
{
writeln(x);
}
// Million $$$ question:
//
// Will I get back the same things as last time?
// Do I have to check for this every time?
foreach (x; r)
{
writeln(x);
}
}
void test2(alias F, R)(/* ref/auto ref? */ R items)
{
// Will it consume items?
// _Should_ it consume items?
// Will the caller be affected? How do I know?
// Am I supposed to?
F(items);
}
You should probably read this tutorial on ranges if you haven't.
When a range will and won't be consumed depends on its type. If it's an input range and not a forward range (e.g if it's an input stream of some kind - std.stdio.byLine would be one example of this), then iterating over it in any way shape or form will consume it.
//Will consume
auto result = find(inRange, needle);
//Will consume
foreach(e; inRange) {}
If it's a forward range and it's a reference type, then it will be consumed whenever you iterate over it, but you can call save to get a copy of it, and consuming the copy won't consume the original (nor will consuming the original consume the copy).
//Will consume
auto result = find(refRange, needle);
//Will consume
foreach(e; refRange) {}
//Won't consume
auto result = find(refRange.save, needle);
//Won't consume
foreach(e; refRange.save) {}
Where things get more interesting is forward ranges which are value types (or arrays). They act the same as any forward range with regards to save, but they differ in that simply passing them to a function or using them in a foreach implicitly saves them.
//Won't consume
auto result = find(valRange, needle);
//Won't consume
foreach(e; valRange) {}
//Won't consume
auto result = find(valRange.save, needle);
//Won't consume
foreach(e; valRange.save) {}
So, if you're dealing with an input range which isn't a forward range, it will be consumed regardless. And if you're dealing with a forward range, you need to call save if you want want to guarantee that it isn't consumed - otherwise whether it's consumed or not depends on its type.
With regards to ref, if you declare a range-based function to take its argument by ref, then it won't be copied, so it won't matter whether the range passed in is a reference type or not, but it does mean that you can't pass an rvalue, which would be really annoying, so you probably shouldn't use ref on a range parameter unless you actually need it to always mutate the original (e.g. std.range.popFrontN takes a ref because it explicitly mutates the original rather than potentially operating on a copy).
As for calling range-based functions with forward ranges, value type ranges are most likely to work properly, since far too often, code is written and tested with value type ranges and isn't always properly tested with reference types. Unfortunately, this includes Phobos' functions (though that will be fixed; it just hasn't been properly tested for in all cases yet - if you run into any cases where a Phobos function doesn't work properly with a reference type forward range, please report it). So, reference type forward ranges don't always work as they should.
Sorry, I can't fit this into a comment :D. Consider if Range were defined this way:
interface Range {
void doForeach(void delegate() myDel);
}
And your function looked like this:
void myFunc(Range r) {
doForeach(() {
//blah
});
}
You wouldn't expect anything strange to happen when you reassigned r, nor would you expect
to be able to modify the caller's Range. I think the problem is that you are expecting your template function to be able to account for all of the variation in range types, while still taking advantage of the specialization. That doesn't work. You can apply a contract to the template to take advantage of the specialization, or use only the general functionality.
Does this help at all?
Edit (what we've been talking about in comments):
void funcThatDoesntRuinYourRanges(R)(R r)
if (isForwardRange(r)) {
//do some stuff
}
Edit 2 std.range It looks like isForwardRange simply checks whether save is defined, and save is just a primitive that makes a sort of un-linked copy of the range. The docs specify that save is not defined for e.g. files and sockets.
The short of it; ranges are consumed. This is what you should expect and plan for.
The ref on the foreach plays no role in this, it only relates to the value returned by the range.
The long; ranges are consumed, but may get copied. You'll need to look at the documentation to decide what will happen. Value types get copied and thus a range may not be modified when passed to a function, but you can not rely on if the range comes as a struct as the data stream my be a reference, e.g. FILE. And of course a ref function parameter will add to the confusion.
Say your print function looks like this:
void print(R)(R r) {
foreach (x; r) {
writeln(x);
}
}
Here, r is passed into the function using reference semantics, using the generic type R: so you don't need ref here (and auto will give a compilation error). Otherwise, this will print the contents of r, item-by-item. (I seem to remember there being a way to constrain the generic type to that of a range, because ranges have certain properties, but I forget the details!)
Anyway:
auto myRange = [1, 2, 3];
print(myRange);
print(myRange);
...will output:
1
2
3
1
2
3
If you change your function to (presuming x++ makes sense for your range):
void print(R)(R r) {
foreach (x; r) {
x++;
writeln(x);
}
}
...then each element will be increased before being printed, but this is using copy semantics. That is, the original values in myRange won't be changed, so the output will be:
2
3
4
2
3
4
If, however, you change your function to:
void print(R)(R r) {
foreach (ref x; r) {
x++;
writeln(x);
}
}
...then the x is reverted to reference semantics, which refer to the original elements of myRange. Hence the output will now be:
2
3
4
3
4
5