Unit Testing a large method - unit-testing

Following Test-Driven Development that is.
I've recently implemented a algorithm (A*) that required a clean interface. By clean all I want is a couple of properties and a single search method.
What I've found hard is testing the search method. It contains around five steps but I'm essentially forced to code this method in one big go which makes things hard.
Is there any advice for this?
Edit
I'm using C#. No I don't have the code at hand at the moment. My problem relies in the fact a test only passes after implementing the whole search method - rather than a step in the algorithm. I naturally refactored the code after but it was implementing I found hard.

If your steps are large enough (or are meaningful in their own right) you should consider delegating them to other smaller classes and test the interaction between your class and them. For example if you have a parsing step followed by a sorting step followed by a searching step it could be meaningful to have a parser class, a sorter class, etc. You would then use TDD on each of those.
No idea which language you're using, but if you're in the .net world you could make these classes internal and then expose them to your test class with "internals visible to" which would keep them hidden.
If the steps are small AND meaningless on their own then tvanfosson's suggestions are the way to go.

Refactor your big method into smaller, private methods. Use reflection, or another mechanism available in your language, to test the smaller methods independently. Worst case -- if you don't have access to reflection or friends, make the methods protected and have your test class inherit from the main class so it can have access to them.
Update: I should also clarify that simply refactoring to private methods doesn't necessarily imply that you need to create tests specific to those methods. If you are thoroughly testing the public methods that rely on the private methods, you may not need to test the private methods directly. This is probably the general case. There are times when it does make sense to test private methods directly (say when it simplifies or reduces the number of test cases needed for public methods), but I wouldn't consider it a requirement to create tests simply because you refactor to a private method implementation.

An important thing to remember when employing TDD is that tests don't need to live forever.
In your example you know you're going to provide a clean interface, and much of the operation will be delegated to private methods. I think it's the assumption that these methods have to be created as private, and stay that way, that causes the most bother. Also the assumption that the tests must stay around once you've developed them.
My advice for this scenario would be to:
Sketch the skeleton of the large method in terms of new methods, so you have a series of steps which make sense to test individually.
Make these new methods public by default, and begin to develop them using TDD.
Then once you have all the parts tested, write another which tests the entire method. Ensuring that if any of the smaller, delegated parts are broken this new test will fail.
At this point the big method works, and is covered by tests, you can now begin to refactor. I'd recommend, as FinnNk does, trying to extract the smaller methods into classes of their own. If this doesn't make sense, or if it would take too much time, I'd recommend something others may not agree with: change the small methods into private ones and delete the tests for them.
The result is that you have used TDD to create the entire method, the method is still covered by the tests, and the API is exactly the one you want to present.
A lot of confusion comes from the idea that once you've written a test, you always have to keep it, and that is not necessarily the case. However, if you are sentimental, there are ways unit tests for private methods can be achieved in Java, perhaps there are similar constructs in C#.

I'm assuming that A* means the search algorithm (e.g. http://en.wikipedia.org/wiki/A*_search_algorithm). If so, I understand your problem as we have similar requirements. Here's the WP algorithm, and I'll comment below:
Pseudo code
function A*(start,goal)
closedset := the empty set % The set of nodes already evaluated.
openset := set containing the initial node % The set of tentative nodes to be evaluated.
g_score[start] := 0 % Distance from start along optimal path.
h_score[start] := heuristic_estimate_of_distance(start, goal)
f_score[start] := h_score[start] % Estimated total distance from start to goal through y.
while openset is not empty
x := the node in openset having the lowest f_score[] value
if x = goal
return reconstruct_path(came_from,goal)
remove x from openset
add x to closedset
foreach y in neighbor_nodes(x)
if y in closedset
continue
tentative_g_score := g_score[x] + dist_between(x,y)
if y not in openset
add y to openset
tentative_is_better := true
elseif tentative_g_score < g_score[y]
tentative_is_better := true
else
tentative_is_better := false
if tentative_is_better = true
came_from[y] := x
g_score[y] := tentative_g_score
h_score[y] := heuristic_estimate_of_distance(y, goal)
f_score[y] := g_score[y] + h_score[y]
return failure
function reconstruct_path(came_from,current_node)
if came_from[current_node] is set
p = reconstruct_path(came_from,came_from[current_node])
return (p + current_node)
else
return the empty path
The closed set can be omitted (yielding a tree search algorithm) if a solution is guaranteed to exist, or if the algorithm is adapted so that new nodes are added to the open set only if they have a lower f value than at any previous iteration.
First, and I'm not being frivolous, it depends whether you understand the algorithm - it sounds as if you do. It would also be possible to transcribe the algorithm above - hoping it worked) and give it a number of tests. That's what I would do as I suspect that the authors of WP are better than me!. The large-scale tests would exercise edge cases such as no node, one node, two nodes+no edge, etc... If they all passed I would sleep happy. But if they failed there is no choice but to understand the algorithm.
If so I think you have to construct tests for the data structures. These are (at least) set, distance, score, etc. You have to create these objects and test them. What is the expected distance for case 1,2,3... write tests. What is the effect of adding A to set Z? needs a test. For this algorithm you need to test heuristic_estimate_of_distance and so on. It's a lot of work.
One approach may be to find an implementation in another language and interrogate it to find the values in the data structures. Of course if you are modifying the algorithm you are on your own!
There's one thing even worse than this - numerical algorithms. Diagonalizing matrices - do we actually get the right answers. I worked with one scientist writing 3rd derivative matrices - it would terrify me...

You might consider Texttest (http://texttest.carmen.se/) as a way of testing "under the interface".
It allows you to check behavior by examining logged data to verify behavior, rather than purely black-box-style testing on arguments and method results.
DISCLAIMER: I've heard a presentation on Texttest, and looked at the docs, but haven't yet had the time to try it out in a serious application.

Related

Does partition strategy helps on Gremlin traversal performance

I tried to play around with the partition strategy as what was mentioned here https://tinkerpop.apache.org/docs/current/reference/ .Initially, I expect that when I define a specific partition key for a zone and write some vertices on it, it would index that specific zones and improve the vertex lookup. Eventually, I realize that the partition key is just like another property value define within a vertex. In other words, these codes is nothing more but just a property value lookup which leads to full graph traversal scan:
g.withStrategies(new PartitionStrategy(partitionKey: "_partition", writePartition: "a",
readPartitions: ["a"]));
I'm not sure what are the underlying logic of this partitionstrategy, but it does not seems to be improve the lookup if it really does full graph scan. Correct me if i;m wrong
From TinkerPop's perspective, PartitionStrategy is just automatically modifying your Gremlin to take advantage of particular property in the graph. TinkerPop doesn't know anything about your graph databases's underlying indexing features nor does it implement any. It is up to your graph to optimize such things. Some graphs might do that on their own, some might offer you the opportunity to create indices that would help improve the speed of PartitionStrategy and others might do nothing at all, leaving PartitionStrategy to not work well for all use cases.
Going back to TinkerPop's perspective, the goal of PartitionStrategy (and SubgraphStrategy for that matter) is more to ease the manner with which Gremlin is written for use cases where parts of the graph need to be hidden. Without it, you would have lots and lots of repetitive filters mixed into your traversal which would muddy its readability.
Consider this bit of code:
graph = TinkerGraph.open()
strategy = new PartitionStrategy(partitionKey: "_partition", writePartition: "a", readPartitions: ["a"])
g = traversal().withEmbedded(graph).withStrategies(strategy)
g.addV().addE('link')
g.V().out().out().out()
The traversal is quite readable and straightforward. It is easy to understand the intent - a three step hop. But that's not really the traversal that executed. What executed was:
g.V().out().has('_partition',within("a")).
out().has('_partition',within("a")).
out().has('_partition',within("a"))
If you are using PartitionStrategy then you need to be sure it suits your graph database as well as your use case.

compilation issue with split/splitter

Here's simple code:
import std.algorithm;
import std.array;
import std.file;
void main(string[] args)
{
auto t = args[1].readText()
.splitter('\n')
.split("---")
;
}
Looks like it should work, but it won't compile. DMD 2.068.2 fails with this error:
Error: template std.algorithm.iteration.splitter cannot deduce function from
argument types !()(Result, string), candidates are:
...
Error: template instance std.array.split!(Result, string) error instantiating
It compiles if I insert .array before .split.
Am I missing something? Or is it a bug? I've tried to make a brief search in the bug tracker, but didn't found anything.
Bottom line: problems like this can often be fixed by sticking a .array call right before the offending function. This provides it a buffer with enough functionality to run the algorithm.
What follows is the reasoning behind the library and a couple other ideas you can use to implement this too:
The reason this doesn't compile has to do with the philosophy behind std.algorithm and ranges: that they are as cheap as possible to push cost decisions to the top level.
In std.algorithm (and most well-written ranges and range-consuming algorithms), template constraints will reject any input that doesn't offer what it needs for free. Similarly, transforming ranges, like filter, splitter, etc., will return only those capabilities they can offer at minimal cost.
By rejecting them at compile time, they force the programmer to make the decision at the highest level as to how they want to pay those costs. You might rewrite the function to work differently, you might buffer it yourself with a variety of techniques to pay the costs up front, or whatever else you can find that works.
So here's what happens with your code: readText returns an array, which is a nearly full-featured range. (Since it returns a string, made of UTF-8, it doesn't actually offer random access as far as Phobos is concerned (though, confusing, the language itself sees it differently, search the D forums for the "autodecode" controversy if you want to learn more) since finding a Unicode code point in a list of variable-length utf-8 characters requires scanning it all. Scanning it all is not minimal cost, so Phobos will never attempt it unless you specifically ask for it.)
Anyway though, readText returns a range with plenty of features, including savability which splitter needs. Why does splitter need saving? Consider the result it promises: a range of strings starting at the last split point and continuing to the next split point. What does the implementation look like when writing this for a most generic range it can possibly do for cheap?
Something along these lines: first, save your starting position so you can return it later. Then, using popFront, advance through it until you find the split point. When it does, return the saved range up to the point of the split point. Then, popFront past the split point and repeat the process until you've consumed the whole thing (while(!input.empty)).
So, since splitter's implementation required the ability to save the starting point, it requires at least a forward range (which is just a savable range. Andrei now feels naming things like this is a bit silly because there's so many names, but at the time he was writing std.algorithm he still believed in giving them all names).
Not all ranges are forward ranges! Arrays are, saving them is as easy as returning a slice from the current position. Many numerical algorithms are too, saving them just means keeping a copy of the current state. Most transformation ranges are savable if the range they are transforming are savable - again, all they need to do is return the current state.
......as I write this, actually, I think your example should be savable. And, indeed, there is an overload that takes a predicate and compiles!
http://dlang.org/phobos/std_algorithm_iteration.html#.splitter.3
import std.algorithm;
import std.array;
import std.stdio;
void main(string[] args)
{
auto t = "foo\n---\nbar"
.splitter('\n')
.filter!(e => e.length)
.splitter!(a => a == "---")
;
writeln(t);
}
Output: [["foo"], ["bar"]]
Yea, it compiled and split on lines equal to a particular thing. The other overload, .splitter("---"), fails to compile, because that overload requires slice functionality (or a narrow string, which Phobos refuses to slice generically... but knows it actually can be anyway, so the function is special-cased. You see that all over the library.)
But, why does it require slicing instead of just saving? Honestly, I don't know. Maybe I'm missing something too, but the existence of the overload that does work implies to me that my conception of the algorithm is correct; it can be done this way. I do believe slicing is a bit cheaper, but the save version is cheap enough too (you'd keep a count of how many items you popped past to get to the splitter, then return saved.take(that_count).... maybe that's the reason right there: you would iterate over the items twice, once inside the algorithm, then again outside, and the library considers that sufficiently costly to punt up a level. (The predicate version sidesteps this by making your function do the scanning, and thus Phobos considers it not its problem anymore, you are aware of what your own function is doing.)
I can see the logic in that. I could go both ways on it though, cuz the decision to actually run over it again is still on the outside, but I don understand why that might not be desirable to do without some thought.
Finally, why doesn't splitter offer indexing or slicing on its output? Why doesn't filter offer it either? Why DOES map offer it?
Well, it has to do with that low cost philosophy again. map can offer it (assuming its input does) because map doesn't actually change the number of elements: the first element in the output is also the first element in the input, just with some function run on the result. Ditto for the last, and all others in between.
filter changes that though. Filtering out the odd numbers of [1,2,3] yields just [2]: the length is different and 2 is now found at the beginning instead of the middle. But, you can't know where it is until you actually apply the filter - you can't jump around without buffering the result.
splitter is similar to filter. It changes the placement of elements, and the algorithm doesn't know where it splits until it actually runs through the elements. So it can tell as you iterate, but not ahead of iteration, so indexing would be O(n) speed - computationally too expensive. Indexing is supposed to be extremely cheap.
Anyway, now that we understand why the principle is there - to let you, the end programmer make decisions about costly things like buffering (which requires more memory than is free) or additional iteration (which requires more CPU time than is cost-free to the algorithm), and have some idea as to why splitter needs it by thinking about its implementation, we can look at ways to satisfy the algorithm: we need to either use the version that eats a few more CPU cycles and write it with our custom compare function (see sample above), or provide slicing somehow. The most straightforward way is by buffering the result in an array.
import std.algorithm;
import std.array;
import std.file;
void main(string[] args)
{
auto t = args[1].readText()
.splitter('\n')
.array // add an explicit buffering call, understanding this will cost us some memory and cpu time
.split("---")
;
}
You might also buffer it locally or something yourself to reduce the cost of the allocation, but however you do it, the cost has to be paid somewhere and Phobos prefers you the programmer, who understands the needs of your program and if you are willing to pay these costs or not, to make that decision instead of it paying it on your behalf without telling you.

Designing a REST hierarchy where there is duplicate data

We are having a debate on how to design REST endpoints. It basically comes down to this contrived example.
Say we have:
/netflix/movie/1/actors <- returns actors A, B and C
/netflix/movie/2/actors <- returns actors A, D, and E
Where the actor A is the same actor.
Now to get the biography of the actor which is "better" (yes, a judgement call):
/netflix/movie/1/actors/A
/netflix/movie/2/actors/A
or:
/actors/A
The disagreement ultimately stems from using Ember.js which expects a certain hierarchy -vs- the desire to not have multiple ways to access the same data (in the end it would truly be a small amount of code duplication). It is possible to map Ember.js to use the /actors/A so there is no strict technical limitation, this is really more of a philosophical question.
I have looked around and I cannot find any solid advice on this sort of thing.
I faced the same problem and went for option 2 (one "canonical" URI per resource) for the sake of simplicity and soundness (one type of resource per root).
Otherwise, when do you stop? Consider:
/actors/
/actors/A
/actors/A/movies
/actors/A/movies/1
/actors/A/movies/1/actors
/actors/A/movies/1/actors/B
...
I would, from an outsiders perspective, expect movies/1/actors/A to return information specific to that actor FOR that movie, whereas I would expect /actors/A to return information on that actor in general.
By analogy, I would expect projects/1/tasks/1/comments to return comments specific to the task - the highest level of the relationship via its url.
I would expect projects/1/comments to return comments related to the lower level project, or to aggregate all comments from the project.
The analogy isn't specific to the data in question, but I think it illustrates the point of url hierarchy leading to certain expectations about the data returned.
I would in this case clearly prefer /actors/A.
My reasoning is, that /movie/1/actors reports a list. This list, beeing a 1-n mapping between movie and actors, is not ment to be a path with further nodes. One simply does not expect to find actors in the movie tree.
You might one day implement /actors/A/movies returning 1 & 2, and this would make you implement URLs like /actors/A/movies/2 - and here you get recursion: movie/actor/movie/actor.
I´d prefer one single URL per object, and one clear spot where the 1-n mapping can be found.

Design for customizable string filter

Suppose I've tons of filenames in my_dir/my_subdir, formatted in a some way:
data11_7TeV.00179691.physics_Egamma.merge.NTUP_PHOTON.f360_m796_p541_tid319627_00
data11_7TeV.00180400.physics_Egamma.merge.NTUP_PHOTON.f369_m812_p541_tid334757_00
data11_7TeV.00178109.physics_Egamma.merge.D2AOD_DIPHO.f351_m765_p539_p540_tid312017_00
For example data11_7TeV is the data_type, 00179691 the run number, NTUP_PHOTON the data format.
I want to write an interface to do something like this:
dataset = DataManager("my_dir/my_subdir").filter_type("data11_7TeV").filter_run("> 00179691").filter_tag("m = 796");
// don't to the filtering, be lazy
cout << dataset.count(); // count is an action, do the filtering
vector<string> dataset_list = dataset.get_list(); // don't repeat the filtering
dataset.save_filter("file.txt", "ALIAS"); // save the filter (not the filenames), for example save the regex
dataset2 = DataManagerAlias("file.txt", "ALIAS"); // get the saved filter
cout << dataset2.filter_tag("p = 123").count();
I want lazy behaviour, for example no real filtering has to be done before any action like count or get_list. I don't want to redo the filtering if it is already done.
I'm just learning something about design pattern, and I think I can use:
an abstract base class AbstractFilter that implement filter* methods
factory to decide from the called method which decorator use
every time I call a filter* method I return a decorated class, for example:
AbstractFilter::filter_run(string arg) {
decorator = factory.get_decorator_run(arg); // if arg is "> 00179691" returns FilterRunGreater(00179691)
return decorator(this);
}
proxy that build a regex to filter the filenames, but don't do the filtering
I'm also learning jQuery and I'm using a similar chaining mechanism.
Can someone give me some hints? Is there some place where a design like this is explained? The design must be very flexible, in particular to handle new format in the filenames.
I believe you're over-complicating the design-pattern aspect and glossing over the underlying matching/indexing issues. Getting the full directory listing from disk can be expected to be orders of magnitude more expensive than the in-RAM filtering of filenames it returns, and the former needs to have completed before you can do a count() or get_list() on any dataset (though you could come up with some lazier iterator operations over the dataset).
As presented, the real functional challenge could be in indexing the filenames so you can repeatedly find the matches quickly. But, even that's unlikely as you presumably proceed from getting the dataset of filenames to actually opening those files, which is again orders of magnitude slower. So, optimisation of the indexing may not make any appreciable impact to your overall program's performance.
But, lets say you read all the matching directory entries into an array A.
Now, for filtering, it seems your requirements can generally be met using std::multimap find(), lower_bound() and upper_bound(). The most general way to approach it is to have separate multimaps for data type, run number, data format, p value, m value, tid etc. that map to a list of indices in A. You can then use existing STL algorithms to find the indices that are common to the results of your individual filters.
There are a lot of optimisations possible if you happen to have unstated insights / restrictions re your data and filtering needs (which is very likely). For example:
if you know a particular filter will always be used, and immediately cuts the potential matches down to a manageable number (e.g. < ~100), then you could use it first and resort to brute force searches for subsequent filtering.
Another possibility is to extract properties of individual filenames into a structure: std::string data_type; std::vector<int> p; etc., then write an expression evaluator supporting predicates like "p includes 924 and data_type == 'XYZ'", though by itself that lends itself to brute-force comparisons rather than faster index-based matching.
I know you said you don't want to use external libraries, but an in-memory database and SQL-like query ability may save you a lot of grief if your needs really are at the more elaborate end of the spectrum.
I would use a strategy pattern. Your DataManager is constructing a DataSet type, and the DataSet has a FilteringPolicy assigned. The default can be a NullFilteringPolicy which means no filters. If the DataSet member function filter_type(string t) is called, it swaps out the filter policy class with a new one. The new one can be factory constructed via the filter_type param. Methods like filter_run() can be used to add filtering conditions onto the FilterPolicy. In the NullFilterPolicy case it's just no-ops. This seems straghtforward to me, I hope this helps.
EDIT:
To address the method chaining you simply need to return *this; e.g. return a reference to the DataSet class. This means you can chain DataSet methods together. It's what the c++ iostream libraries do when you implement operator>> or operator<<.
First of all, I think that your design is pretty smart and lends itself well to the kind of behavior you are trying to model.
Anyway, my understanding is that you are trying and building a sort of "Domain Specific Language", whereby you can chain "verbs" (the various filtering methods) representing actions on, or connecting "entities" (where the variability is represented by different naming formats that could exist, although you do not say anything about this).
In this respect, a very interesting discussion is found in Martin Flowler's book "Domain Specific Languages". Just to give you a taste of what it is about, here you can find an interesting discussion about the "Method Chaining" pattern, defined as:
“Make modifier methods return the host object so that multiple modifiers can be invoked in a single expression.”
As you can see, this pattern describes the very chaining mechanism you are positing in your design.
Here you have a list of all the patterns that were found interesting in defining such DSLs. Again, you will be easily find there several specialized patterns that you are also implying in your design or describing as way of more generic patterns (like the decorator). A few of them are: Regex Table Lexer, Method Chaining, Expression Builder, etc. And many more that could help you further specify your design.
All in all, I could add my grain of salt by saying that I see a place for a "command processor" pattern in your specificaiton, but I am pretty confident that by deploying the powerful abstractions that Fowler proposes you will be able to come up with a much more specific and precise design, covering aspect of the problem that right now are simply hidden by the "generality" of the GoF pattern set.
It is true that this could be "overkill" for a problem like the one you are describing, but as an exercise in pattern oriented design it can be very insightful.
I'd suggest starting with the boost iterator library - eg the filter iterator.
(And, of course, boost includes a very nice regex library.)

Are infinite lists useful for any real world applications?

I've been using haskell for quite a while now, and I've read most of Real World Haskell and Learn You a Haskell. What I want to know is whether there is a point to a language using lazy evaluation, in particular the "advantage" of having infinite lists, is there a task which infinite lists make very easy, or even a task that is only possible with infinite lists?
Here's an utterly trivial but actually day-to-day useful example of where infinite lists specifically come in handy: When you have a list of items that you want to use to initialize some key-value-style data structure, starting with consecutive keys. So, say you have a list of strings and you want to put them into an IntMap counting from 0. Without lazy infinite lists, you'd do something like walk down the input list, keeping a running "next index" counter and building up the IntMap as you go.
With infinite lazy lists, the list itself takes the role of the running counter; just use zip [0..] with your list of items to assign the indices, then IntMap.fromList to construct the final result.
Sure, it's essentially the same thing in both cases. But having lazy infinite lists lets you express the concept much more directly without having to worry about details like the length of the input list or keeping track of an extra counter.
An obvious example is chaining your data processing from input to whatever you want to do with it. E.g., reading a stream of characters into a lazy list, which is processed by a lexer, also producing a lazy list of tokens which are parsed into a lazy AST structure, then compiled and executed. It's like using Unix pipes.
I found it's often easier and cleaner to just define all of a sequence in one place, even if it's infinite, and have the code that uses it just grab what it wants.
take 10 mySequence
takeWhile (<100) mySequence
instead of having numerous similar but not quite the same functions that generate a subset
first10ofMySequence
elementsUnder100ofMySequence
The benefits are greater when different subsections of the same sequence are used in different areas.
Infinite data structures (including lists) give a huge boost to modularity and hence reusability, as explained & illustrated in John Hughes's classic paper Why Functional Programming Matters.
For instance, you can decompose complex code chunks into producer/filter/consumer pieces, each of which is potentially useful elsewhere.
So wherever you see real-world value in code reuse, you'll have an answer to your question.
Basically, lazy lists allow you to delay computation until you need it. This can prove useful when you don't know in advance when to stop, and what to precompute.
A standard example is u_n a sequence of numerical computations converging to some limit. You can ask for the first term such that |u_n - u_{n-1}| < epsilon, the right number of terms is computed for you.
Now, you have two such sequences u_n and v_n, and you want to know the sum of the limits to epsilon accuracy. The algorithm is:
compute u_n until epsilon/2 accuracy
compute v_n until epsilon/2 accuracy
return u_n + v_n
All is done lazily, only the necessary u_n and v_n are computed. You may want less simple examples, eg. computing f(u_n) where you know (ie. know how to compute) f's modulus of continuity.
Sound synthesis - see this paper by Jerzy Karczmarczuk:
http://users.info.unicaen.fr/~karczma/arpap/cleasyn.pdf
Jerzy Karczmarcuk has a number of other papers using infinite lists to model mathematical objects like power series and derivatives.
I've translated the basic sound synthesis code to Haskell - enough for a sine wave unit generator and WAV file IO. The performance was just about adequate to run with GHCi on a 1.5GHz Athalon - as I just wanted to test the concept I never got round to optimizing it.
Infinite/lazy structures permit the idiom of "tying the knot": http://www.haskell.org/haskellwiki/Tying_the_Knot
The canonically simple example of this is the Fibonacci sequence, defined directly as a recurrence relation. (Yes, yes, hold the efficiency complaints/algorithms discussion -- the point is the idiom.): fibs = 1:1:zipwith (+) fibs (tail fibs)
Here's another story. I had some code that only worked with finite streams -- it did some things to create them out to a point, then did a whole bunch of nonsense that involved acting on various bits of the stream dependent on the entire stream prior to that point, merging it with information from another stream, etc. It was pretty nice, but I realized it had a whole bunch of cruft necessary for dealing with boundary conditions, and basically what to do when one stream ran out of stuff. I then realized that conceptually, there was no reason it couldn't work on infinite streams. So I switched to a data type without a nil -- i.e. a genuine stream as opposed to a list, and all the cruft went away. Even though I know I'll never need the data past a certain point, being able to rely on it being there allowed me to safely remove lots of silly logic, and let the mathematical/algorithmic part of my code stand out more clearly.
One of my pragmatic favorites is cycle. cycle [False, True] generates the infinite list [False, True, False, True, False ...]. In particular, xs ! 0 = False, xs ! 1 = True, so this is just says whether or not the index of the element is odd or not. Where does this show up? Lot's of places, but here's one that any web developer ought to be familiar with: making tables that alternate shading from row to row.
The general pattern seen here is that if we want to do some operation on a finite list, rather than having to construct a specific finite list that will “do the thing we want,” we can use an infinite list that will work for all sizes of lists. camcann’s answer is in this vein.