seq function caveats in clojure - clojure

In the docstring of clojure's seq function, it mentions:
Note that seqs cache values, thus seq
should not be used on any Iterable whose iterator repeatedly
returns the same mutable object.
What does this sentence mean? Why emphasize the same mutable object?

The comment was added later and mentions this ticket:
Some Java libraries return iterators that return the same mutable object on every call:
Hadoop ReduceContextImpl$ValueIterator
Mahout DenseVector$AllIterator/NonDefaultIterator
LensKit FastIterators
While careful usage of seq or iterator-seq over these iterators worked in the past, that is no longer true as of the changes in CLJ-1669 - iterator-seq now produces a chunked sequence. Because next() is called 32 times on the iterator before the first value can be retrieved from the seq, and the same mutable object is returned every time, code on iterators like this now receives different (incorrect) results.
Approach: Sequences cache values and are thus incompatible with holding mutable and mutating Java objects. We will add some clarification about this to seq and iterator-seq docstrings. For those iterators above, it is recommended to either process those iterators in a loop/recur or to wrap them in a lazy-seq that transforms each re-returned mutable object into a proper value prior to caching.

The Clojure's seq function can create sequences from many types of objects like collections and arrays. seq also works on any object that implements the java.util.Iterable interface from the Java Collections framework. Unfortunately, the semantics of Clojure sequences and java.util.Iterator (which is used with the Iterable) are not 100% compatible as pointed out in the answer from #cfrick.
It is, or at some point was, considered ok for each invocation of the next method of the Iterator to return the same (mutable) object. This works only as long as the return value of next is used and discarded prior to the subsequent call to next. However, if return values of next are retained and used later, undefined behavior can result. This is exactly what happens in some implementations of Clojure sequences.
Let me illustrate. The following is a toy implementation of a range of integers in Java. Notice how the implementation of the method next always returns the same object.
package foo.bar;
import java.util.*;
public class MyRange implements Iterable<MyRange.Num> {
public static class Num {
private int n;
public int get() { return n; }
public String toString() { return String.valueOf(n); }
}
private int max;
public MyRange(int max) { this.max = max; }
// Implementation of Iterable
public Iterator<Num> iterator() {
return new Iterator<Num> () {
private int at = 0;
private Num num = new Num();
public boolean hasNext() {
return at < max;
}
public Num next() {
num.n = at++;
return num;
}
};
}
}
This code works fine when consumed in a way intended by the designers of the Java Collections framework. For example:
(loop [i (.iterator (MyRange. 3))]
(when (.hasNext i)
(print (str (.next i) " "))
(recur i)))
;;=> 0 1 2
But once we bring Clojure sequence into the mix, things go wrong:
(map #(.get %) (MyRange. 3))
;;=> (2 2 2)
We got (2 2 2) instead of (0 1 2). This is exactly the type of an issue that the warning in seq is concerned about.
If memory serves, the implementation of the Iterator for EnumhMap in Java 6 used the mutable object implementation in the name of efficiency. Such implementation does not allocate memory on every iteration, so it is faster and does not create garbage. But this "technique" was problematic not only for Clojure but for some Java users as well. So the behavior was changed in Java 7.

Related

Is List in Scala mutable or immutable? [duplicate]

What is the difference between a var and val definition in Scala and why does the language need both? Why would you choose a val over a var and vice versa?
As so many others have said, the object assigned to a val cannot be replaced, and the object assigned to a var can. However, said object can have its internal state modified. For example:
class A(n: Int) {
var value = n
}
class B(n: Int) {
val value = new A(n)
}
object Test {
def main(args: Array[String]) {
val x = new B(5)
x = new B(6) // Doesn't work, because I can't replace the object created on the line above with this new one.
x.value = new A(6) // Doesn't work, because I can't replace the object assigned to B.value for a new one.
x.value.value = 6 // Works, because A.value can receive a new object.
}
}
So, even though we can't change the object assigned to x, we could change the state of that object. At the root of it, however, there was a var.
Now, immutability is a good thing for many reasons. First, if an object doesn't change internal state, you don't have to worry if some other part of your code is changing it. For example:
x = new B(0)
f(x)
if (x.value.value == 0)
println("f didn't do anything to x")
else
println("f did something to x")
This becomes particularly important with multithreaded systems. In a multithreaded system, the following can happen:
x = new B(1)
f(x)
if (x.value.value == 1) {
print(x.value.value) // Can be different than 1!
}
If you use val exclusively, and only use immutable data structures (that is, avoid arrays, everything in scala.collection.mutable, etc.), you can rest assured this won't happen. That is, unless there's some code, perhaps even a framework, doing reflection tricks -- reflection can change "immutable" values, unfortunately.
That's one reason, but there is another reason for it. When you use var, you can be tempted into reusing the same var for multiple purposes. This has some problems:
It will be more difficult for people reading the code to know what is the value of a variable in a certain part of the code.
You may forget to re-initialize the variable in some code path, and end up passing wrong values downstream in the code.
Simply put, using val is safer and leads to more readable code.
We can, then, go the other direction. If val is that better, why have var at all? Well, some languages did take that route, but there are situations in which mutability improves performance, a lot.
For example, take an immutable Queue. When you either enqueue or dequeue things in it, you get a new Queue object. How then, would you go about processing all items in it?
I'll go through that with an example. Let's say you have a queue of digits, and you want to compose a number out of them. For example, if I have a queue with 2, 1, 3, in that order, I want to get back the number 213. Let's first solve it with a mutable.Queue:
def toNum(q: scala.collection.mutable.Queue[Int]) = {
var num = 0
while (!q.isEmpty) {
num *= 10
num += q.dequeue
}
num
}
This code is fast and easy to understand. Its main drawback is that the queue that is passed is modified by toNum, so you have to make a copy of it beforehand. That's the kind of object management that immutability makes you free from.
Now, let's covert it to an immutable.Queue:
def toNum(q: scala.collection.immutable.Queue[Int]) = {
def recurse(qr: scala.collection.immutable.Queue[Int], num: Int): Int = {
if (qr.isEmpty)
num
else {
val (digit, newQ) = qr.dequeue
recurse(newQ, num * 10 + digit)
}
}
recurse(q, 0)
}
Because I can't reuse some variable to keep track of my num, like in the previous example, I need to resort to recursion. In this case, it is a tail-recursion, which has pretty good performance. But that is not always the case: sometimes there is just no good (readable, simple) tail recursion solution.
Note, however, that I can rewrite that code to use an immutable.Queue and a var at the same time! For example:
def toNum(q: scala.collection.immutable.Queue[Int]) = {
var qr = q
var num = 0
while (!qr.isEmpty) {
val (digit, newQ) = qr.dequeue
num *= 10
num += digit
qr = newQ
}
num
}
This code is still efficient, does not require recursion, and you don't need to worry whether you have to make a copy of your queue or not before calling toNum. Naturally, I avoided reusing variables for other purposes, and no code outside this function sees them, so I don't need to worry about their values changing from one line to the next -- except when I explicitly do so.
Scala opted to let the programmer do that, if the programmer deemed it to be the best solution. Other languages have chosen to make such code difficult. The price Scala (and any language with widespread mutability) pays is that the compiler doesn't have as much leeway in optimizing the code as it could otherwise. Java's answer to that is optimizing the code based on the run-time profile. We could go on and on about pros and cons to each side.
Personally, I think Scala strikes the right balance, for now. It is not perfect, by far. I think both Clojure and Haskell have very interesting notions not adopted by Scala, but Scala has its own strengths as well. We'll see what comes up on the future.
val is final, that is, cannot be set. Think final in java.
In simple terms:
var = variable
val = variable + final
val means immutable and var means mutable.
Full discussion.
The difference is that a var can be re-assigned to whereas a val cannot. The mutability, or otherwise of whatever is actually assigned, is a side issue:
import collection.immutable
import collection.mutable
var m = immutable.Set("London", "Paris")
m = immutable.Set("New York") //Reassignment - I have change the "value" at m.
Whereas:
val n = immutable.Set("London", "Paris")
n = immutable.Set("New York") //Will not compile as n is a val.
And hence:
val n = mutable.Set("London", "Paris")
n = mutable.Set("New York") //Will not compile, even though the type of n is mutable.
If you are building a data structure and all of its fields are vals, then that data structure is therefore immutable, as its state cannot change.
Thinking in terms of C++,
val x: T
is analogous to constant pointer to non-constant data
T* const x;
while
var x: T
is analogous to non-constant pointer to non-constant data
T* x;
Favoring val over var increases immutability of the codebase which can facilitate its correctness, concurrency and understandability.
To understand the meaning of having a constant pointer to non-constant data consider the following Scala snippet:
val m = scala.collection.mutable.Map(1 -> "picard")
m // res0: scala.collection.mutable.Map[Int,String] = HashMap(1 -> picard)
Here the "pointer" val m is constant so we cannot re-assign it to point to something else like so
m = n // error: reassignment to val
however we can indeed change the non-constant data itself that m points to like so
m.put(2, "worf")
m // res1: scala.collection.mutable.Map[Int,String] = HashMap(1 -> picard, 2 -> worf)
"val means immutable and var means mutable."
To paraphrase, "val means value and var means variable".
A distinction that happens to be extremely important in computing (because those two concepts define the very essence of what programming is all about), and that OO has managed to blur almost completely, because in OO, the only axiom is that "everything is an object". And that as a consequence, lots of programmers these days tend not to understand/appreciate/recognize, because they have been brainwashed into "thinking the OO way" exclusively. Often leading to variable/mutable objects being used like everywhere, when value/immutable objects might/would often have been better.
val means immutable and var means mutable
you can think val as java programming language final key world or c++ language const key world。
Val means its final, cannot be reassigned
Whereas, Var can be reassigned later.
It's as simple as it name.
var means it can vary
val means invariable
Val - values are typed storage constants. Once created its value cant be re-assigned. a new value can be defined with keyword val.
eg. val x: Int = 5
Here type is optional as scala can infer it from the assigned value.
Var - variables are typed storage units which can be assigned values again as long as memory space is reserved.
eg. var x: Int = 5
Data stored in both the storage units are automatically de-allocated by JVM once these are no longer needed.
In scala values are preferred over variables due to stability these brings to the code particularly in concurrent and multithreaded code.
Though many have already answered the difference between Val and var.
But one point to notice is that val is not exactly like final keyword.
We can change the value of val using recursion but we can never change value of final. Final is more constant than Val.
def factorial(num: Int): Int = {
if(num == 0) 1
else factorial(num - 1) * num
}
Method parameters are by default val and at every call value is being changed.
In terms of javascript , it same as
val -> const
var -> var

How do I sort two tuples of containers?

I have two tuples, each containing containers of different types.
std::tuple<containerA<typesA>...> tupleA;
std::tuple<containerB<typesB>...> tupleB;
So, as an example tupleA might be defined like this:
std::tuple<list<int>, list<float>> tupleA;
The two containers, containerA and containerB are different types. typesA and typesB do not intersect. I want to sort the containers inside the tuples by their sizes, and be able to access them by their type. So, as an example
std::tuple<list<int>, list<float>> tupleA {{2}, {3.3f, 4.2f}};
std::tuple<deque<double>, deque<uint8_t>> tupleB {{2.0, 1.2, 4.4}, {}};
auto sortedArray = sort(tupleA, tupleB);
sortedArray == {deque<uint8_t>, list<int>, list<float>, deque<double>};
sortedArray.get<list<float>>() == {3.3f, 4.2f};
std::get<list<int>>(tupleA).push_back(4);
std::get<list<int>>(tupleA).push_back(5);
std::get<list<int>>(tupleA).push_back(6);
sortedArray = sort(tupleA, tupleB);
sortedArray == {deque<uint8_t>, list<float>, deque<double>, list<int>};
The most important part is that I have to store sortedArray and that the sizes of the elements in the tuple may change. I have not been able to create such a sort function. The actual syntax of accessing the sortedArray is not important.
I tried to use naive indexes to the container data inside the sortedArray, such as using std::pair<A, 1> to refer to the second element in tuple A, however I can't access the information of the tuple through it because it is not a constexpr
This is possible but not easy.
First, you need to generate a compile-time list of every permutation of N elements (there are N factorial of them).
Write both a runtime and compile time permutation object (separate).
Make a runtime map from permutation to index. (map step)
Convert your tuple to a vector of (index, size). Sort this by size. Extract the permutation. Map this to the index of the set of permutations. (Sort step, uses map step)
Write a "magic switch" that takes a function object f and a permutation index, and invokes the f with the compile time permutation. (magic step)
Write code that takes a compile time permutation and reorders a tuple based on it. (reorder step)
Write code that takes a function object f and a tuple. Do (sort step). Do (magic step), feeding it a second function object g that takes the passed in permutation and the tuple and (reorder step), then calls f with it.
Call the function that does this bob.
std::tuple<list<int>, list<float>> tupleA {{2}, {3.3f, 4.2f}};
std::tuple<deque<double>, deque<uint8_t>> tupleB {{2.0, 1.2, 4.4}, {}};
bob(concat_tuple_tie(tupleA, tupleB), [&](auto&& sorted_array){
assert( std::is_same<
std::tuple<deque<uint8_t>&, list<int>&, list<float>&, deque<double>&>,
std::decay_t<decltype(sorted_array)>
>::value, "wrong order" );
sortedArray.get<list<float>>() == {3.3f, 4.2f};
});
std::get<list<int>>(tupleA).push_back(4);
std::get<list<int>>(tupleA).push_back(5);
std::get<list<int>>(tupleA).push_back(6);
bob(concat_tuple_tie(tupleA, tupleB), [&](auto&& sorted_array){
assert( std::is_same<
std::tuple<deque<uint8_t>&, list<float>&, deque<double>&, list<int>&>,
std::decay_t<decltype(sorted_array)>
>::value, "wrong order" );
});
Personally, I doubt you need to do this.
I could do this, but it might take me hours, so I'm not going to do it for a SO answer. You can look at my magic switch code for the most magic of the above. The other hard part is that permutation step.
Note that the code uses continuation passing style. Also note that every permutation of the lambda is instantiated, including the wrong ones, so your code must be valid for every permutation. This can result in an insane amount of code bloat.
An alternative solution could involve creating type-erasure wrappers around your containers and simply sorting those, but that is not what you asked for.
The problem you describe sounds very much like you want dynamic polymorphism, not static polymorphism — and your difficulty in solving it is because you're stuck on trying to express with tools for doing static polymorphism.
i.e. you need a type (hierarchy) for collections (or pointers to collections) that let you select the type of collection and the type of data at run-time, but still provides a unified interface to functions you need like size (e.g. by inheritance and virtual functions).
e.g. a beginning of a sketch might look like
struct SequencePointer
{
virtual size_t size() const = 0;
virtual boost::any operator[](size_t n) const = 0;
};
template <typename Sequence>
struct SLSequencePointer
{
const Sequence *ptr;
size_t size() const { return ptr->size(); }
boost::any operator[](size_t n) const { return (*ptr)[n]; }
};

Assign object into unordered_map array?

typedef tr1::unordered_map <string, pin *> pin_cmp;
pin_cmp _pin_cmp;
_Pins[_num_pins] = new pin (pin_id, _num_pins, s, n, d);
_pin_cmp[_Pins[_num_pins]->get_name ()] = _Pins[_num_pins]; //performance profiling
Could you teach me what actually the code doing?
_pin_cmp[_Pins[_num_pins]->get_name ()] = _Pins[_num_pins]; //performance profiling
I am not familiar with unordered_map which still can use with array[].I am confuse unordered_map just need key and value why will have array[]?
In above example I expect _Pins to be a sequential container.
_pin_cmp[_Pins[_num_pins]->get_name ()] = _Pins[_num_pins]; //performance profiling
This line of code accesses a element _Pins[_num_pins] twice:
On the right handside to get the object
On the left handside to get the name of the object.
Then the object is placed inside _pin_cmp (unordered map) using the name of the object as index.
Exact behavior of this operation is described here.

How to write this java code in clojure (tips)

The question
I'm learning clojure and I love learning languages by example. But I do not like it just get a complete answer without me having to think about it.
So what I want is just some tips on what functions I might need and maybe some other clues.
The answer that I will accept is the one that gave me the necessary building blocks to create this.
public class IntervalMap<K extends Comparable<K>, V> extends
TreeMap<K, V> {
V defaultValue = null;
public IntervalMap(V defaultValue) {
super();
this.defaultValue = defaultValue;
}
/**
*
* Get the value corresponding to the given key
*
* #param key
* #return The value corresponding to the largest key 'k' so that
* " k is the largest value while being smaller than 'key' "
*/
public V getValue(K key) {
// if it is equal to a key in the map, we can already return the
// result
if (containsKey(key))
return super.get(key);
// Find largest key 'k' so that
// " k is the largest value while being smaller than 'key' "
// highest key
K k = lastKey();
while (k.compareTo(key) != -1) {
k = lowerKey(k);
if (k == null)
return defaultValue;
}
return super.get(k);
}
#Override
public V get(Object key) {
return getValue((K) key);
}
}
Update
I want to recreate the functionality of this class
For examples you can go here: Java Code Snippet: IntervalMap
I'd be looking at some combination of:
(sorted-map & key-vals) - will allow you to create a map ordered by keys. you can supply your own comparator to define the order.
(contains? coll key) - tests whether a collection holds an item identified by the argument (this is a common source of confusion when applied to vector, where contains? returns true if there is an element at the given index rather than the given value)
(drop-while pred coll) lets you skip items in a collection while a predicate is true
I would just use a map combined with a function to retrieve the closest value given a certain key. Read about maps and functions if you want to know more.
If you want to be able to mutate the data in the map, store the map in one of clojure's mutable storage facilities, for example an atom or ref. Read about mutable data if you want to know more.
You could use a function that has closed over the default value and/or the map or atom referring to a map. Read about closures if you want to know more.
The use of Protocols might come in handy here too. So, read about that too. Enough to get you going? ;-)
A few things that I used in my implementation of an interval-get function:
contains?, like #sw1nn suggested, is perfect for checking whether a map contains a particular key.
keys can be used to get all of the keys in a map.
filter keeps all of the elements in a sequence meeting some predicate.
sort, as you have have guessed, sorts a sequence.
last returns the last element in a sequence or nil if the sequence is empty.
if-let can be used to bind and act on a value if it is not falsey.
The usage of the resulting function was as follows:
(def m {0 "interval 1", 5 "interval 2"})
(interval-get m 3) ; "interval 1"
(interval-get m 5) ; "interval 2"
(interval-get m -1) ; nil
If you want to implement the code block "conceptually" in clojure then the existing answers already answer your question, but in case you want the code block to be "structurally" same in clojure (i.e the subclassing etc) then have a look at gen-class and proxy in clojure documentation.

How do you use ranges in D?

Whenever I try to use ranges in D, I fail miserably.
What is the proper way to use ranges in D? (See inline comments for my confusion.)
void print(R)(/* ref? auto ref? neither? */ R r)
{
foreach (x; r)
{
writeln(x);
}
// Million $$$ question:
//
// Will I get back the same things as last time?
// Do I have to check for this every time?
foreach (x; r)
{
writeln(x);
}
}
void test2(alias F, R)(/* ref/auto ref? */ R items)
{
// Will it consume items?
// _Should_ it consume items?
// Will the caller be affected? How do I know?
// Am I supposed to?
F(items);
}
You should probably read this tutorial on ranges if you haven't.
When a range will and won't be consumed depends on its type. If it's an input range and not a forward range (e.g if it's an input stream of some kind - std.stdio.byLine would be one example of this), then iterating over it in any way shape or form will consume it.
//Will consume
auto result = find(inRange, needle);
//Will consume
foreach(e; inRange) {}
If it's a forward range and it's a reference type, then it will be consumed whenever you iterate over it, but you can call save to get a copy of it, and consuming the copy won't consume the original (nor will consuming the original consume the copy).
//Will consume
auto result = find(refRange, needle);
//Will consume
foreach(e; refRange) {}
//Won't consume
auto result = find(refRange.save, needle);
//Won't consume
foreach(e; refRange.save) {}
Where things get more interesting is forward ranges which are value types (or arrays). They act the same as any forward range with regards to save, but they differ in that simply passing them to a function or using them in a foreach implicitly saves them.
//Won't consume
auto result = find(valRange, needle);
//Won't consume
foreach(e; valRange) {}
//Won't consume
auto result = find(valRange.save, needle);
//Won't consume
foreach(e; valRange.save) {}
So, if you're dealing with an input range which isn't a forward range, it will be consumed regardless. And if you're dealing with a forward range, you need to call save if you want want to guarantee that it isn't consumed - otherwise whether it's consumed or not depends on its type.
With regards to ref, if you declare a range-based function to take its argument by ref, then it won't be copied, so it won't matter whether the range passed in is a reference type or not, but it does mean that you can't pass an rvalue, which would be really annoying, so you probably shouldn't use ref on a range parameter unless you actually need it to always mutate the original (e.g. std.range.popFrontN takes a ref because it explicitly mutates the original rather than potentially operating on a copy).
As for calling range-based functions with forward ranges, value type ranges are most likely to work properly, since far too often, code is written and tested with value type ranges and isn't always properly tested with reference types. Unfortunately, this includes Phobos' functions (though that will be fixed; it just hasn't been properly tested for in all cases yet - if you run into any cases where a Phobos function doesn't work properly with a reference type forward range, please report it). So, reference type forward ranges don't always work as they should.
Sorry, I can't fit this into a comment :D. Consider if Range were defined this way:
interface Range {
void doForeach(void delegate() myDel);
}
And your function looked like this:
void myFunc(Range r) {
doForeach(() {
//blah
});
}
You wouldn't expect anything strange to happen when you reassigned r, nor would you expect
to be able to modify the caller's Range. I think the problem is that you are expecting your template function to be able to account for all of the variation in range types, while still taking advantage of the specialization. That doesn't work. You can apply a contract to the template to take advantage of the specialization, or use only the general functionality.
Does this help at all?
Edit (what we've been talking about in comments):
void funcThatDoesntRuinYourRanges(R)(R r)
if (isForwardRange(r)) {
//do some stuff
}
Edit 2 std.range It looks like isForwardRange simply checks whether save is defined, and save is just a primitive that makes a sort of un-linked copy of the range. The docs specify that save is not defined for e.g. files and sockets.
The short of it; ranges are consumed. This is what you should expect and plan for.
The ref on the foreach plays no role in this, it only relates to the value returned by the range.
The long; ranges are consumed, but may get copied. You'll need to look at the documentation to decide what will happen. Value types get copied and thus a range may not be modified when passed to a function, but you can not rely on if the range comes as a struct as the data stream my be a reference, e.g. FILE. And of course a ref function parameter will add to the confusion.
Say your print function looks like this:
void print(R)(R r) {
foreach (x; r) {
writeln(x);
}
}
Here, r is passed into the function using reference semantics, using the generic type R: so you don't need ref here (and auto will give a compilation error). Otherwise, this will print the contents of r, item-by-item. (I seem to remember there being a way to constrain the generic type to that of a range, because ranges have certain properties, but I forget the details!)
Anyway:
auto myRange = [1, 2, 3];
print(myRange);
print(myRange);
...will output:
1
2
3
1
2
3
If you change your function to (presuming x++ makes sense for your range):
void print(R)(R r) {
foreach (x; r) {
x++;
writeln(x);
}
}
...then each element will be increased before being printed, but this is using copy semantics. That is, the original values in myRange won't be changed, so the output will be:
2
3
4
2
3
4
If, however, you change your function to:
void print(R)(R r) {
foreach (ref x; r) {
x++;
writeln(x);
}
}
...then the x is reverted to reference semantics, which refer to the original elements of myRange. Hence the output will now be:
2
3
4
3
4
5