Related
I'm new to Clojure. Apologies if it's a stupid question!
Should I use a set instead of a vector or list each time I don't care about the order of the items? What is the common criteria to decide between these three when order is not necessary?
It really depends on how you will be using the items.
If you will be searching for items, use a set.
If you will be processing it sequentially use a list.
If you will be chopping it into even sized chunks (like while sorting) use a vector.
If you need to count the length use a vector.
If you will be typing these by hand use a vector (to save a little quoting)
In practice most of the processing I see involves turning the data into a seq and processing that so the distinctions between list and vector are often a matter personal taste.
In general, you want a set when your primary concern is "Is this thing in this group?" Besides not preserving order, sets also only hold a given value once. So if you care about the precise placement of values, a vector is more what you want. If you mainly care about testing for membership, a set is more appropriate.
Yes, use a set. Unless you have very, very good reasons to pick something else (performance, memory use, ...) the set is the correct choice.
Remember that programming is primarily about communicating with the human reader of your code and not with the computer. By using a set you make it totally clear that the order of elements is irrelevant (and you are not expecting duplicate values) helping the reader understand your intentions and your own mental mind set.
In C, you can have a pointer to the first and last element of a singly-linked list, providing constant time access to the end of a list. Thus, appending one list to another can be done in constant time.
As far as I am aware, scheme does not provide this functionality (namely constant access to the end of a list) by default. To be clear, I am not looking for "pointer" functionality. I understand that is non-idiomatic in scheme and (as I suppose) unnecessary.
Could someone either 1) demonstrate the ability to provide a way to append two lists in constant time or 2) assure me that this is already available by default in scheme or racket (e.g., tell me that append is in fact a constant operation if I am wrong to think otherwise)?
EDIT:
I should make myself clearer. I am trying to create an inspectable queue. I want to have a list that I can 1) push onto the front in constant time, 2) pop off the back in constant time, and 3) iterate over using Racket's foldr or something similar (a Lisp right fold).
Standard Lisp lists cannot be appended to in constant time.
However, if you make your own list type, you can do it. Basically, you can use a record type (or just a cons cell)---let's call this the "header"---that holds pointers to the head and tail of the list, and update it each time someone adds to the list.
However, be aware that if you do that, lists are no longer structurally inductive. i.e., a longer list isn't simply an extension of a shorter list, because of the extra "header" involved. Thus, you lose a great part of the simplicity of Lisp algorithms which involve recursing into the cdr of a list at each iteration.
In other words, the lack of easy appending is a tradeoff to enable recursive algorithms to be written much more easily. Most functional programmers will agree that this is the right tradeoff, since appending in a pure-functional sense means that you have to copy every cell in all but the last list---so it's no longer O(1), anyway.
ETA to reflect OP's edit
You can create a queue, but with the opposite behaviour: you add elements to the back, and retrieve elements in the front. If you are willing to work with that, such a data structure is easy to implement in Scheme. (And yes, it's easy to append two such queues in constant time.)
Racket also has a similar queue data structure, but it uses a record type instead of cons cells, because Racket cons cells are immutable. You can convert your queue to a list using queue->list (at O(n) complexity) for times when you need to fold.
You want a FIFO queue. user448810 mentions the standard implementation for a purely-functional FIFO queue.
Your concern about losing the "key advantage of Lisp lists" needs to be unpacked a bit:
You can write combinators for custom data structures in Lisp. If you implement a queue type, you can easily write fold, map, filter and so on for it.
Scheme, however, does lack in the area of providing polymorphic sequence functions that can work on multiple sequence types. You do often end up either (a) converting your data structures back to lists in order to use the rich library of list functions, or (b) implementing your own versions of various of these functions for your custom types.
This is very much a shame, because singly-linked lists, while they are hugely useful for tons of computations, are not a do-all data structure.
But what is worse is that there's a lot of Lisp folk who like to pretend that lists are a "universal datatype" that can and should be used to represent any kind of data. I've programmed Lisp for a living, and oh my god I hate the code that these people produce; I call it "Lisp programmer's disease," and have much too often had to go in and fix a lot of n^2 that uses lists to represent sets or dictionaries to use hash tables or search trees instead. Don't fall into that trap. Use proper data structures for the task at hand. You can always build your own opaque data types using record types and modules in Racket; you make them opaque by exporting the type but not the field accessors for the record type (you export your type's user-facing operations instead).
It sounds like you are looking for a deque, not a list. The standard idiom for a deque is to keep two lists, the front half of the list in normal order and the back half of the list in reverse order, thus giving access to both ends of the deque. If the half of the list that you want to access is empty, reverse the other half and swap the meaning of the two halves. Look here for a fuller explanation and sample code.
Scala is new to me so I'm not sure the best way to go about this.
I need to simply take the strings within a single list and join them.
So, concat(List("a","b","c")) returns abc.
Should I first see how many strings there are in the list, that way I can just loop through and join them all? I feel like that needs to be done first, that way you can use the lists just like an array and do list[1] append list[2] append list[3], etc..
Edit:
Here's my idea, of course with compile errors..
def concat(l: List[String]): String = {
var len = l.length
var i = 0
while (i < len) {
val result = result :: l(i) + " "
}
result
}
How about this, on REPL
List("a","b","c") mkString("")
or in script file
List("a","b","c").mkString("")
Some options to explore for you:
imperative: for-loop; use methods from the List object to determine
loop length or use for-each List item
classical functional: recursive function, one element at the time using
higher-order functions: look at fold.
Given the basic level of the problem, I think you're looking at learning some fundamentals in programming. If the language of choice is Scala, probably the focus is on functional programming, so I'd put effort on solving #2, then solve #1. #3 for extra credits.
This exercise is designed to encourage you to think about the problem from a functional perspective. You have a set of data over which you wish to move, performing a set of identical operations. You've already identified the imperative, looping construct (for). Simple enough. Now, how would you build that into a functional construct, not relying on "stateful" looping?
In functional programming, fold ... is a family of higher-order
functions that iterate an arbitrary function over a data structure in
some order and build up a return value.
http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29
That sounds like something you could use.
As string concatenation is associative (to be exact, it forms a monoid having the empty String as neutral element), the "direction" of the fold doesn't matter (at least if you're not bothered by performance).
Speaking of performance: In real life, it would be a good idea to use a StringBuilder for the intermediate steps, but it's up to you if you want to use it.
A bit longer that mkString but more efficient:
s.foldLeft(new StringBuilder())(_ append _).toString()
I'm just assuming here that you are not only new to Scala, but also new to programming in general. I'm not saying SO is not made for newbies, but I'm sure there are many other places, which are better suited for your needs. For example books...
I'm also assuming that your problem doesn't have to be solved in a functional, imperative or some other way. It just has to be solved as a homework assignment.
So here are the list of things you should consider / ask yourself:
If you want to concat all elements of the list do you really need to know how many there are?
If you think you do, fine, but after having solved this problem using this approach try to fiddle around with your solution a little bit to find out if there is another way.
Appending the elements to a resulting list is a thought in right direction, but think about this: in addition to being object-oriented Scala is also a full-blown functional language. You might not know what this means, but all you need to know for now is this: it is pretty darn good with things like lists (LISP is the most known functional language and it stands for LISt Processing, which has to be an indication of some kind, don't you think? ;)). So maybe there is some magical (maybe even Scala idiomatic) way to accomplish such a concatination without defining the resulting list yourself.
Which is better idiomatic clojure practice for representing a tree made up of different node types:
A. building trees out of several different types of records, that one defines using deftype or defrecord:
(defrecord node_a [left right])
(defrecord node_b [left right])
(defrecord leaf [])
(def my-tree (node_a. (node_b. (leaf.) (leaf.)) (leaf.)))
B. building trees out of vectors, with keywords designating the types:
(def my-tree [:node-a [:node-b :leaf :leaf] :leaf])
Most clojure code that I see seems to favor the usage of the general purpose data structures (vectors, maps, etc.), rather than datatypes or records. Hiccup, to take one example, represents html very nicely using the vector + keyword approach.
When should we prefer one style over the other?
You can put as many elements into a vector as you want. A record has a set number of fields. If you want to constrain your nodes to only have N sub-nodes, records might be good, e.g. making when a binary tree, where a node has to have only a Left and Right. But for something like HTML or XML, you probably want to support arbitrary numbers of sub-nodes.
Using vectors and keywords means that "extending" the set of supported node types is as simple as putting a new keyword into the vector. [:frob "foo"] is OK in Hiccup even if its author never heard of frobbing. Using records, you'd potentially have to define a new record for every node type. But then you get the benefit of catching typos and verifying subnodes. [:strnog "some bold text?"] isn't going to be caught by Hiccup, but (Strnog. "foo") would be a compile-time error.
Vectors being one of Clojure's basic data types, you can use Clojure's built-in functions to manipulate them. Want to extend your tree? Just conj onto it, or update-in, or whatever. You can build up your tree incrementally this way. With records, you're probably stuck with constructor calls, or else you have to write a ton of wrapper functions for the constructors.
Seems like this partly boils down to an argument of dynamic vs. static. Personally, I would go the dynamic (vector + keyword) route unless there was a specific need for the benefits of using records. It's probably easier to code that way, and it's more flexible for the user, at the cost of being easier for the user to end up making a mess. But Clojure users are likely used to having to handle dangerous weapons on a regular basis. Clojure being largely a dynamic language, staying dynamic is often the right thing to do.
This is a good question. I think both are appropriate for different kinds of problems. Nested vectors are a good solution if each node can contain a variable set of information - in particular templating systems are going to work well. Records are a good solution for a smallish number of fixed node types where nesting is far more constrained.
We do a lot of work with heterogeneous trees of records. Each node represents one of a handful of well-known types, each with a different set of known fixed keys. The reason records are better in this case is that you can pick the data out of the node by key which is O(1) (really a Java method call which is very fast), not O(n) (where you have to look through the node contents) and also generally easier to access.
Records in 1.2 are imho not quite "finished" but it's pretty easy to build that stuff yourself. We have a defrecord2 that adds constructor functions (new-foo), field validation, print support, pprint support, tree walk/edit support via zippers, etc.
An example of where we use this is to represent ASTs or execution plans where nodes might be things like Join, Sort, etc.
Vectors are going to be better for creating stuff like strings where an arbitrary number of things can be put in each node. If you can stuff 1+ <p>s inside a <div>, then you can't create a record that contains a :p field - that just doesn't make any sense. That's a case where vectors are far more flexible and idiomatic.
Why does nobody seem to use tuples in C++, either the Boost Tuple Library or the standard library for TR1? I have read a lot of C++ code, and very rarely do I see the use of tuples, but I often see lots of places where tuples would solve many problems (usually returning multiple values from functions).
Tuples allow you to do all kinds of cool things like this:
tie(a,b) = make_tuple(b,a); //swap a and b
That is certainly better than this:
temp=a;
a=b;
b=temp;
Of course you could always do this:
swap(a,b);
But what if you want to rotate three values? You can do this with tuples:
tie(a,b,c) = make_tuple(b,c,a);
Tuples also make it much easier to return multiple variable from a function, which is probably a much more common case than swapping values. Using references to return values is certainly not very elegant.
Are there any big drawbacks to tuples that I'm not thinking of? If not, why are they rarely used? Are they slower? Or is it just that people are not used to them? Is it a good idea to use tuples?
A cynical answer is that many people program in C++, but do not understand and/or use the higher level functionality. Sometimes it is because they are not allowed, but many simply do not try (or even understand).
As a non-boost example: how many folks use functionality found in <algorithm>?
In other words, many C++ programmers are simply C programmers using C++ compilers, and perhaps std::vector and std::list. That is one reason why the use of boost::tuple is not more common.
Because it's not yet standard. Anything non-standard has a much higher hurdle. Pieces of Boost have become popular because programmers were clamoring for them. (hash_map leaps to mind). But while tuple is handy, it's not such an overwhelming and clear win that people bother with it.
The C++ tuple syntax can be quite a bit more verbose than most people would like.
Consider:
typedef boost::tuple<MyClass1,MyClass2,MyClass3> MyTuple;
So if you want to make extensive use of tuples you either get tuple typedefs everywhere or you get annoyingly long type names everywhere. I like tuples. I use them when necessary. But it's usually limited to a couple of situations, like an N-element index or when using multimaps to tie the range iterator pairs. And it's usually in a very limited scope.
It's all very ugly and hacky looking when compared to something like Haskell or Python. When C++0x gets here and we get the 'auto' keyword tuples will begin to look a lot more attractive.
The usefulness of tuples is inversely proportional to the number of keystrokes required to declare, pack, and unpack them.
For me, it's habit, hands down: Tuples don't solve any new problems for me, just a few I can already handle just fine. Swapping values still feels easier the old fashioned way -- and, more importantly, I don't really think about how to swap "better." It's good enough as-is.
Personally, I don't think tuples are a great solution to returning multiple values -- sounds like a job for structs.
But what if you want to rotate three values?
swap(a,b);
swap(b,c); // I knew those permutation theory lectures would come in handy.
OK, so with 4 etc values, eventually the n-tuple becomes less code than n-1 swaps. And with default swap this does 6 assignments instead of the 4 you'd have if you implemented a three-cycle template yourself, although I'd hope the compiler would solve that for simple types.
You can come up with scenarios where swaps are unwieldy or inappropriate, for example:
tie(a,b,c) = make_tuple(b*c,a*c,a*b);
is a bit awkward to unpack.
Point is, though, there are known ways of dealing with the most common situations that tuples are good for, and hence no great urgency to take up tuples. If nothing else, I'm not confident that:
tie(a,b,c) = make_tuple(b,c,a);
doesn't do 6 copies, making it utterly unsuitable for some types (collections being the most obvious). Feel free to persuade me that tuples are a good idea for "large" types, by saying this ain't so :-)
For returning multiple values, tuples are perfect if the values are of incompatible types, but some folks don't like them if it's possible for the caller to get them in the wrong order. Some folks don't like multiple return values at all, and don't want to encourage their use by making them easier. Some folks just prefer named structures for in and out parameters, and probably couldn't be persuaded with a baseball bat to use tuples. No accounting for taste.
As many people pointed out, tuples are just not that useful as other features.
The swapping and rotating gimmicks are just gimmicks. They are utterly confusing to those who have not seen them before, and since it is pretty much everyone, these gimmicks are just poor software engineering practice.
Returning multiple values using tuples is much less self-documenting then the alternatives -- returning named types or using named references. Without this self-documenting, it is easy to confuse the order of the returned values, if they are mutually convertible, and not be any wiser.
Not everyone can use boost, and TR1 isn't widely available yet.
When using C++ on embedded systems, pulling in Boost libraries gets complex. They couple to each other, so library size grows. You return data structures or use parameter passing instead of tuples. When returning tuples in Python the data structure is in the order and type of the returned values its just not explicit.
You rarely see them because well-designed code usually doesn't need them- there are not to many cases in the wild where using an anonymous struct is superior to using a named one.
Since all a tuple really represents is an anonymous struct, most coders in most situations just go with the real thing.
Say we have a function "f" where a tuple return might make sense. As a general rule, such functions are usually complicated enough that they can fail.
If "f" CAN fail, you need a status return- after all, you don't want callers to have to inspect every parameter to detect failure. "f" probably fits into the pattern:
struct ReturnInts ( int y,z; }
bool f(int x, ReturnInts& vals);
int x = 0;
ReturnInts vals;
if(!f(x, vals)) {
..report error..
..error handling/return...
}
That isn't pretty, but look at how ugly the alternative is. Note that I still need a status value, but the code is no more readable and not shorter. It is probably slower too, since I incur the cost of 1 copy with the tuple.
std::tuple<int, int, bool> f(int x);
int x = 0;
std::tuple<int, int, bool> result = f(x); // or "auto result = f(x)"
if(!result.get<2>()) {
... report error, error handling ...
}
Another, significant downside is hidden in here- with "ReturnInts" I can add alter "f"'s return by modifying "ReturnInts" WITHOUT ALTERING "f"'s INTERFACE. The tuple solution does not offer that critical feature, which makes it the inferior answer for any library code.
Certainly tuples can be useful, but as mentioned there's a bit of overhead and a hurdle or two you have to jump through before you can even really use them.
If your program consistently finds places where you need to return multiple values or swap several values, it might be worth it to go the tuple route, but otherwise sometimes it's just easier to do things the classic way.
Generally speaking, not everyone already has Boost installed, and I certainly wouldn't go through the hassle of downloading it and configuring my include directories to work with it just for its tuple facilities. I think you'll find that people already using Boost are more likely to find tuple uses in their programs than non-Boost users, and migrants from other languages (Python comes to mind) are more likely to simply be upset about the lack of tuples in C++ than to explore methods of adding tuple support.
As a data-store std::tuple has the worst characteristics of both a struct and an array; all access is nth position based but one cannot iterate through a tuple using a for loop.
So if the elements in the tuple are conceptually an array, I will use an array and if the elements are not conceptually an array, a struct (which has named elements) is more maintainable. ( a.lastname is more explanatory than std::get<1>(a)).
This leaves the transformation mentioned by the OP as the only viable usecase for tuples.
I have a feeling that many use Boost.Any and Boost.Variant (with some engineering) instead of Boost.Tuple.