How do you use ranges in D? - d

Whenever I try to use ranges in D, I fail miserably.
What is the proper way to use ranges in D? (See inline comments for my confusion.)
void print(R)(/* ref? auto ref? neither? */ R r)
{
foreach (x; r)
{
writeln(x);
}
// Million $$$ question:
//
// Will I get back the same things as last time?
// Do I have to check for this every time?
foreach (x; r)
{
writeln(x);
}
}
void test2(alias F, R)(/* ref/auto ref? */ R items)
{
// Will it consume items?
// _Should_ it consume items?
// Will the caller be affected? How do I know?
// Am I supposed to?
F(items);
}

You should probably read this tutorial on ranges if you haven't.
When a range will and won't be consumed depends on its type. If it's an input range and not a forward range (e.g if it's an input stream of some kind - std.stdio.byLine would be one example of this), then iterating over it in any way shape or form will consume it.
//Will consume
auto result = find(inRange, needle);
//Will consume
foreach(e; inRange) {}
If it's a forward range and it's a reference type, then it will be consumed whenever you iterate over it, but you can call save to get a copy of it, and consuming the copy won't consume the original (nor will consuming the original consume the copy).
//Will consume
auto result = find(refRange, needle);
//Will consume
foreach(e; refRange) {}
//Won't consume
auto result = find(refRange.save, needle);
//Won't consume
foreach(e; refRange.save) {}
Where things get more interesting is forward ranges which are value types (or arrays). They act the same as any forward range with regards to save, but they differ in that simply passing them to a function or using them in a foreach implicitly saves them.
//Won't consume
auto result = find(valRange, needle);
//Won't consume
foreach(e; valRange) {}
//Won't consume
auto result = find(valRange.save, needle);
//Won't consume
foreach(e; valRange.save) {}
So, if you're dealing with an input range which isn't a forward range, it will be consumed regardless. And if you're dealing with a forward range, you need to call save if you want want to guarantee that it isn't consumed - otherwise whether it's consumed or not depends on its type.
With regards to ref, if you declare a range-based function to take its argument by ref, then it won't be copied, so it won't matter whether the range passed in is a reference type or not, but it does mean that you can't pass an rvalue, which would be really annoying, so you probably shouldn't use ref on a range parameter unless you actually need it to always mutate the original (e.g. std.range.popFrontN takes a ref because it explicitly mutates the original rather than potentially operating on a copy).
As for calling range-based functions with forward ranges, value type ranges are most likely to work properly, since far too often, code is written and tested with value type ranges and isn't always properly tested with reference types. Unfortunately, this includes Phobos' functions (though that will be fixed; it just hasn't been properly tested for in all cases yet - if you run into any cases where a Phobos function doesn't work properly with a reference type forward range, please report it). So, reference type forward ranges don't always work as they should.

Sorry, I can't fit this into a comment :D. Consider if Range were defined this way:
interface Range {
void doForeach(void delegate() myDel);
}
And your function looked like this:
void myFunc(Range r) {
doForeach(() {
//blah
});
}
You wouldn't expect anything strange to happen when you reassigned r, nor would you expect
to be able to modify the caller's Range. I think the problem is that you are expecting your template function to be able to account for all of the variation in range types, while still taking advantage of the specialization. That doesn't work. You can apply a contract to the template to take advantage of the specialization, or use only the general functionality.
Does this help at all?
Edit (what we've been talking about in comments):
void funcThatDoesntRuinYourRanges(R)(R r)
if (isForwardRange(r)) {
//do some stuff
}
Edit 2 std.range It looks like isForwardRange simply checks whether save is defined, and save is just a primitive that makes a sort of un-linked copy of the range. The docs specify that save is not defined for e.g. files and sockets.

The short of it; ranges are consumed. This is what you should expect and plan for.
The ref on the foreach plays no role in this, it only relates to the value returned by the range.
The long; ranges are consumed, but may get copied. You'll need to look at the documentation to decide what will happen. Value types get copied and thus a range may not be modified when passed to a function, but you can not rely on if the range comes as a struct as the data stream my be a reference, e.g. FILE. And of course a ref function parameter will add to the confusion.

Say your print function looks like this:
void print(R)(R r) {
foreach (x; r) {
writeln(x);
}
}
Here, r is passed into the function using reference semantics, using the generic type R: so you don't need ref here (and auto will give a compilation error). Otherwise, this will print the contents of r, item-by-item. (I seem to remember there being a way to constrain the generic type to that of a range, because ranges have certain properties, but I forget the details!)
Anyway:
auto myRange = [1, 2, 3];
print(myRange);
print(myRange);
...will output:
1
2
3
1
2
3
If you change your function to (presuming x++ makes sense for your range):
void print(R)(R r) {
foreach (x; r) {
x++;
writeln(x);
}
}
...then each element will be increased before being printed, but this is using copy semantics. That is, the original values in myRange won't be changed, so the output will be:
2
3
4
2
3
4
If, however, you change your function to:
void print(R)(R r) {
foreach (ref x; r) {
x++;
writeln(x);
}
}
...then the x is reverted to reference semantics, which refer to the original elements of myRange. Hence the output will now be:
2
3
4
3
4
5

Related

a pushBack() function, as opposite to popFront()

Can I use popFront() and then eventually push back what was poped? The number of calls to popFront() might be more than one (but not much greater than it, say < 10, if does matter). This is also the number of calls which the imaginary pushBack() function will be called too.
for example:
string s = "Hello, World!";
int n = 5;
foreach(i; 0 .. n) {
// do something with s.front
s.popFront();
}
if(some_condition) {
foreach(i; 0 .. n) {
s.pushBack();
}
}
writeln(s); // should output "Hello, World!" since number of poped is same as pushed back.
I think popFront() does use .ptr but I'm not sure if it in D does makes any difference and can help anyway to reach my goal easily (i.e, in D's way and not write my own with a Circular buffer or so).
A completely different approach to reach it is very welcome too.
A range is either generative (e.g. if it's a list of random numbers), or it's a view into a container. In neither case does it make sense to push anything onto it. As you call popFront, you're iterating through the list and shrinking your view of the container. If you think of a range being like two C++ iterators for a moment, and you have something like
struct IterRange(T)
{
#property bool empty() { return iter == end; }
#property T front() { return *iter; }
void popFront() { ++iter; }
private Iterator iter;
private Iterator end;
}
then it will be easier to understand. If you called popFront, it would move the iterator forward by one, thereby changing which element you're looking at, but you can't add elements in front of it. That would require doing something like an insertion on the container itself, and maybe the iterator or range could be used to tell the container where you want an alement inserted, but the iterator or range can't do that itself. The same goes if you have a generative range like
struct IncRange(T)
{
#property bool empty() { value == T.max; }
#property T front() { return value; }
void popFront() { ++value; }
private T value;
}
It keeps incrementing the value, and there is no container backing it. So, it doesn't even have anywhere that you could push a value onto.
Arrays are a little bit funny because they're ranges but they're also containers (sort of). They have range semantics when popping elements off of them or slicing them, but they don't own their own memory, and once you append to them, you can get a completely different chunk of memory with the same values. So, it is sort of a range that you can add and remove elements from - but you can't do it using the range API. So, you could do something like
str = newChar ~ str;
but that's not terribly efficient. You could make it more efficient by creating a new array at the target size and then filling in its elements rather than concatenating repeatedly, but regardless, pushing something on the the front of an array is not a particularly idiomatic or efficient thing to be doing.
Now, if what you're looking to do is just reset the range so that it once again refers to the elements that were popped off rather than really push elements onto it - that is, open up the window again so that it shows what it showed before - that's a bit different. It's still not supported by the range API at all (you can never unpop anything that was popped off). However, if the range that you're dealing with is a forward range (and arrays are), then you can save the range before you pop off the elements and then use that to restore the previous state. e.g.
string s = "Hello, World!";
int n = 5;
auto saved = s.save;
foreach(i; 0 .. n)
s.popFront();
if(some_condition)
s = saved;
So, you have to explicitly store the previous state yourself in order to restore it instead of having something like unpopFront, but having the range store that itself (as would be required for unpopFront) would be very inefficient in most cases (much is it might work in the iterator case if the range kept track of where the beginning of the container was).
No, there is no standard way to "unpop" a range or a string.
If you were to pass a slice of a string to a function:
fun(s[5..10]);
You'd expect that that function would only be able to see those 5 characters. If there was a way to "unpop" the slice, the function would be able to see the entire string.
Now, D is a system programming language, so expanding a slice is possible using pointer arithmetic and GC queries. But there is nothing in the standard library to do this for you.

C++ function to tell whether a given function is injective

This might seem like a weird question, but how would I create a C++ function that tells whether a given C++ function that takes as a parameter a variable of type X and returns a variable of type X, is injective in the space of machine representation of those variables, i.e. never returns the same variable for two different variables passed to it?
(For those of you who weren't Math majors, maybe check out this page if you're still confused about the definition of injective: http://en.wikipedia.org/wiki/Injective_function)
For instance, the function
double square(double x) { return x*x};
is not injective since square(2.0) = square(-2.0),
but the function
double cube(double x) { return x*x*x};
is, obviously.
The goal is to create a function
template <typename T>
bool is_injective(T(*foo)(T))
{
/* Create a set std::set<T> retVals;
For each element x of type T:
if x is in retVals, return false;
if x is not in retVals, add it to retVals;
Return true if we made it through the above loop.
*/
}
I think I can implement that procedure except that I'm not sure how to iterate through every element of type T. How do I accomplish that?
Also, what problems might arise in trying to create such a function?
You need to test every possible bit pattern of length sizeof(T).
There was a widely circulated blog post about this topic recently: There are Only Four Billion Floats - So Test Them All!
In that post, the author was able to test all 32-bit floats in 90 seconds. Turns out that would take a few centuries for 64-bit values.
So this is only possible with small input types.
Multiple inputs, structs, or anything with pointers are going to get impossible fast.
BTW, even with 32-bit values you will probably exhaust system memory trying to store all the output values in a std::set, because std::set uses a lot of extra memory for pointers. Instead, you should use a bitmap that's big enough to hold all 2^sizeof(T) output values. The specialized std::vector<bool> should work. That will take 2^sizeof(T) / 8 bytes of memory.
Maybe what you need is std::numeric_limits. To store the results, you may use an unordered_map (from std if you're using C++11, or from boost if you're not).
You can check the limits of the data types, maybe something like this might work (it's a dumb solution, but it may get you started):
template <typename T>
bool is_injective(T(*foo)(T))
{
std::unordered_map<T, T> hash_table;
T min = std::numeric_limits<T>::min();
T max = std::numeric_limits<T>::max();
for(T it = min; i < max; ++i)
{
auto result = hash_table.emplace(it, foo(it));
if(result.second == false)
{
return false;
}
}
return true;
}
Of course, you may want to restrict a few of the possible data types. Otherwise, if you check for floats, doubles or long integers, it'll get very intensive.
but the function
double cube(double x) { return x*x*x};
is, obviously.
It is obviously not. There are 2^53 more double values representable in [0..0.5) than in [0..0.125).
As far as I know, you cannot iterate all possible values of a type in C++.
But, even if you could, that approach would get you nowhere. If your type is a 64 bit integer, you might have to iterate through 2^64 values and keep track of the result for all of them, which is not possible.
Like other people said, there is no solution for a generic type X.

How do you create an array of member function pointers with arguments?

I am trying to create a jump table for a fuzzy controller. Basically, I have a lot of functions that take in a string and return a float, and I want to be able to do something along the lines:
float Defuzzify(std::string varName, DefuzzificationMethod defuzz)
{
return functions[defuzz](varName);
}
where DefuzzificationMethod is an enum. The objective is to avoid a switch statement and have a O(1) operation.
What I have right now is:
float CenterOfGravity(std::string varName);
std::vector<std::function<float (std::string)>> defuzzifiers;
Then I try to initialize it in the constructor with:
defuzzifiers.reserve(NUMBER_OF_DEFUZZIFICATION_METHODS);
defuzzifiers[DEFUZZ_COG] = std::bind(&CenterOfGravity, std::placeholders::_1);
This is making the compiler throw about 100 errors about enable_if (which I don't use anywhere, so I assume std does). Is there a way to make this compile ? Moreover, is there a way to make this a static vector, since every fuzzy controller will essentially have the same vector ?
Thanks in advance
Reserve just makes sure there's enough capacity, it doesn't actually mak the vector's size big enough. What you want to do is:
// construct a vector of the correct size
std::vector<std::function<float (std::string)>> defuzzifiers(NUMBER_OF_DEFUZZIFICATION_METHODS);
// now assign into it...
// if CentorOfGravity is a free function, just simple = works
defuzzifiers[DEFUZZ_COG] = CenterOfGravity;
// if it's a method
defuzzifiers[DEFUZZ_COG] = std::bind(&ThisType::CenterOfGravity, this, std::placeholders::_1);
Now this might leave you some holes which don't actually have a function defined, so maybe you want to provide a default function of sorts, which the vector constructor allows too
std::vector<std::function<float (std::string)>> defuzzifiers(
NUMBER_OF_DEFUZZIFICATION_METHODS,
[](std::string x) { return 0f; }
);
An unrelated note, you probably want your functions to take strings by const-ref and not by value, as copying strings is expensive.

boost lambda collection size evaluation

I have a function of the form:
void DoSomething(const boost::function<bool ()>& condition, other stuff);
This function does some work and returns only when the condition is true. The condition has been expressed as a functor argument because I want to supply different conditions at different call sites.
Now, this is fairly straightforward to use directly, but it requires declaring lots of little throwaway functions or functor objects, which I'd like to avoid if possible. I've been looking at Boost's lambda library for possible ways to do away with these, but I think I'm missing something fundamental; I just can't get it to do what I want.
One case that's stumped me at the moment: I have a std::vector collection called data; the condition that I'm after is when the size() of that collection reaches a certain threshold. Essentially, then, I want my condition functor to return true when data.size() >= threshold and false otherwise. But I've been having trouble expressing that in lambda syntax.
The best that I've been able to come up with thus far (which at least compiles, though it doesn't work) is this:
boost::function<bool (size_t)> ge = boost::bind(std::greater_equal<size_t>(),
_1, threshold);
boost::function<size_t ()> size = boost::bind(&std::vector<std::string>::size,
data);
DoSomething(boost::lambda::bind(ge, boost::lambda::bind(size)), other stuff);
On entry to DoSomething, the size is 0 -- and even though the size increases during the course of running, the calls to condition() always seem to get a size of 0. Tracing it through (which is a bit tricky through Boost's internals), while it does appear to be calling greater_equal each time condition() is evaluated, it doesn't appear to be calling size().
So what fundamental thing have I completely messed up? Is there a simpler way of expressing this sort of thing (while still keeping the code as inline as possible)?
I'd ideally like to get it as close as possible to the C# equivalent code fluency:
DoSomething(delegate() { return data.size() >= threshold; }, other stuff);
DoSomething(() => (data.size() >= threshold), other stuff);
The problem is, that the lambda function stores a copy of the data vector, not a reference. So size() is called on the copy, not the original object that you are modifying. This can be solved by wrapping data with boost::ref, which stores a reference instead:
boost::function<size_t ()> size = boost::bind(&std::vector<std::string>::size,
boost::ref(data));
You can also use the normal >= operator instead of std::greater_equal<> in the definition of your lambda function and combine it all together:
boost::function<bool ()> cond =
(boost::bind(&std::vector<std::string>::size, boost::ref(data))
>= threshold);
DoSomething(cond, other stuff);

Boost::Tuples vs Structs for return values

I'm trying to get my head around tuples (thanks #litb), and the common suggestion for their use is for functions returning > 1 value.
This is something that I'd normally use a struct for , and I can't understand the advantages to tuples in this case - it seems an error-prone approach for the terminally lazy.
Borrowing an example, I'd use this
struct divide_result {
int quotient;
int remainder;
};
Using a tuple, you'd have
typedef boost::tuple<int, int> divide_result;
But without reading the code of the function you're calling (or the comments, if you're dumb enough to trust them) you have no idea which int is quotient and vice-versa. It seems rather like...
struct divide_result {
int results[2]; // 0 is quotient, 1 is remainder, I think
};
...which wouldn't fill me with confidence.
So, what are the advantages of tuples over structs that compensate for the ambiguity?
tuples
I think i agree with you that the issue with what position corresponds to what variable can introduce confusion. But i think there are two sides. One is the call-side and the other is the callee-side:
int remainder;
int quotient;
tie(quotient, remainder) = div(10, 3);
I think it's crystal clear what we got, but it can become confusing if you have to return more values at once. Once the caller's programmer has looked up the documentation of div, he will know what position is what, and can write effective code. As a rule of thumb, i would say not to return more than 4 values at once. For anything beyond, prefer a struct.
output parameters
Output parameters can be used too, of course:
int remainder;
int quotient;
div(10, 3, &quotient, &remainder);
Now i think that illustrates how tuples are better than output parameters. We have mixed the input of div with its output, while not gaining any advantage. Worse, we leave the reader of that code in doubt on what could be the actual return value of div be. There are wonderful examples when output parameters are useful. In my opinion, you should use them only when you've got no other way, because the return value is already taken and can't be changed to either a tuple or struct. operator>> is a good example on where you use output parameters, because the return value is already reserved for the stream, so you can chain operator>> calls. If you've not to do with operators, and the context is not crystal clear, i recommend you to use pointers, to signal at the call side that the object is actually used as an output parameter, in addition to comments where appropriate.
returning a struct
The third option is to use a struct:
div_result d = div(10, 3);
I think that definitely wins the award for clearness. But note you have still to access the result within that struct, and the result is not "laid bare" on the table, as it was the case for the output parameters and the tuple used with tie.
I think a major point these days is to make everything as generic as possible. So, say you have got a function that can print out tuples. You can just do
cout << div(10, 3);
And have your result displayed. I think that tuples, on the other side, clearly win for their versatile nature. Doing that with div_result, you need to overload operator<<, or need to output each member separately.
Another option is to use a Boost Fusion map (code untested):
struct quotient;
struct remainder;
using boost::fusion::map;
using boost::fusion::pair;
typedef map<
pair< quotient, int >,
pair< remainder, int >
> div_result;
You can access the results relatively intuitively:
using boost::fusion::at_key;
res = div(x, y);
int q = at_key<quotient>(res);
int r = at_key<remainder>(res);
There are other advantages too, such as the ability to iterate over the fields of the map, etc etc. See the doco for more information.
With tuples, you can use tie, which is sometimes quite useful: std::tr1::tie (quotient, remainder) = do_division ();. This is not so easy with structs. Second, when using template code, it's sometimes easier to rely on pairs than to add yet another typedef for the struct type.
And if the types are different, then a pair/tuple is really no worse than a struct. Think for example pair<int, bool> readFromFile(), where the int is the number of bytes read and bool is whether the eof has been hit. Adding a struct in this case seems like overkill for me, especially as there is no ambiguity here.
Tuples are very useful in languages such as ML or Haskell.
In C++, their syntax makes them less elegant, but can be useful in the following situations:
you have a function that must return more than one argument, but the result is "local" to the caller and the callee; you don't want to define a structure just for this
you can use the tie function to do a very limited form of pattern matching "a la ML", which is more elegant than using a structure for the same purpose.
they come with predefined < operators, which can be a time saver.
I tend to use tuples in conjunction with typedefs to at least partially alleviate the 'nameless tuple' problem. For instance if I had a grid structure then:
//row is element 0 column is element 1
typedef boost::tuple<int,int> grid_index;
Then I use the named type as :
grid_index find(const grid& g, int value);
This is a somewhat contrived example but I think most of the time it hits a happy medium between readability, explicitness, and ease of use.
Or in your example:
//quotient is element 0 remainder is element 1
typedef boost:tuple<int,int> div_result;
div_result div(int dividend,int divisor);
One feature of tuples that you don't have with structs is in their initialization. Consider something like the following:
struct A
{
int a;
int b;
};
Unless you write a make_tuple equivalent or constructor then to use this structure as an input parameter you first have to create a temporary object:
void foo (A const & a)
{
// ...
}
void bar ()
{
A dummy = { 1, 2 };
foo (dummy);
}
Not too bad, however, take the case where maintenance adds a new member to our struct for whatever reason:
struct A
{
int a;
int b;
int c;
};
The rules of aggregate initialization actually mean that our code will continue to compile without change. We therefore have to search for all usages of this struct and updating them, without any help from the compiler.
Contrast this with a tuple:
typedef boost::tuple<int, int, int> Tuple;
enum {
A
, B
, C
};
void foo (Tuple const & p) {
}
void bar ()
{
foo (boost::make_tuple (1, 2)); // Compile error
}
The compiler cannot initailize "Tuple" with the result of make_tuple, and so generates the error that allows you to specify the correct values for the third parameter.
Finally, the other advantage of tuples is that they allow you to write code which iterates over each value. This is simply not possible using a struct.
void incrementValues (boost::tuples::null_type) {}
template <typename Tuple_>
void incrementValues (Tuple_ & tuple) {
// ...
++tuple.get_head ();
incrementValues (tuple.get_tail ());
}
Prevents your code being littered with many struct definitions. It's easier for the person writing the code, and for other using it when you just document what each element in the tuple is, rather than writing your own struct/making people look up the struct definition.
Tuples will be easier to write - no need to create a new struct for every function that returns something. Documentation about what goes where will go to the function documentation, which will be needed anyway. To use the function one will need to read the function documentation in any case and the tuple will be explained there.
I agree with you 100% Roddy.
To return multiple values from a method, you have several options other than tuples, which one is best depends on your case:
Creating a new struct. This is good when the multiple values you're returning are related, and it's appropriate to create a new abstraction. For example, I think "divide_result" is a good general abstraction, and passing this entity around makes your code much clearer than just passing a nameless tuple around. You could then create methods that operate on the this new type, convert it to other numeric types, etc.
Using "Out" parameters. Pass several parameters by reference, and return multiple values by assigning to the each out parameter. This is appropriate when your method returns several unrelated pieces of information. Creating a new struct in this case would be overkill, and with Out parameters you emphasize this point, plus each item gets the name it deserves.
Tuples are Evil.