Test equality of collections with custom matcher - unit-testing

How do you compare collections in a Hamcrest JUnit test in Kotlin using a custom comparator for the entries?
Assume we have the following data class with a custom equals method which defines equality as the strings having the same length:
data class Data (val s : String) {
override fun equals(other: Any?): Boolean {
if (this === other) return true
if (javaClass != other?.javaClass) return false
other as Data
if (s.length != other.s.length) return false
return true
}
}
Now we can easily test whether two lists contain equal elements:
val a = listOf(Data("Hello"), Data("World"))
val b = listOf(Data("world"), Data("hello"))
assertThat(a, everyItem(IsIn(b)))
assertThat(b, everyItem(IsIn(a)))
Here the custom equals method is used to compare whether Data objects are equal.
But if we want to do the same with maps, it does not work this way, as map entries are compared for (shallow) equality:
val a = mapOf(1 to Data("Hello"), 2 to Data("World"))
val b = mapOf(1 to Data("hello"), 2 to Data("world"))
assertThat(a.entries.toSet(), everyItem(IsIn(b.entries.toSet())))
assertThat(b.entries.toSet(), everyItem(IsIn(a.entries.toSet())))
This gives the following AssertionError:
Expected: every item is one of {<1=Data(s=hello)>, <2=Data(s=world)>}
but: an item was <1=Data(s=Hello)>
If not comparing constants like from this Data class, but for instance making the class a normal class, it will actually only check whether the key and value objects are the same objects (not call equals on them), and the error is
Expected: every item is one of {<1=Data#5e918d2>, <2=Data#6c11b92>}
but: an item was <1=Data#42628b2>
How can we specify how the Map.Entry objects are compared to each other, for example that the custom equals method is used? In general, when testing collections for equality, can we pass a custom matcher that determines whether two entries are equal?

Related

kotlin: sort List<T> with T being a class

I have a public class defined in Kotlin: public class Edge(val v: Int, val u: Int, val weight: Double) that helps me define weighted edges of a graph.
Now, in another class, I need to make a list, which I defined as var Sides = mutableListOf<Edge>() but I need to order the list in ascending order which depends on the third parameter of Edge (which is weight). So that if I have the list:
Sides = {Edge(4, 8, 4.1), Edge(20, 9, 7.5), Edge(5, 4, 0.0)}, it becomes:
Sides = {Edge(5, 4, 0.0), Edge(4, 8, 4.1), Edge(20, 9, 7.5)}
Is there any function like .sort() I can use to order this list? or do I have to manually make a function of a sorting method for this?
Thanks in advance
For a mutable collection like MutableList, you can use the sortBy function to sort the original list itself.
sides.sortBy { it.weight }
And, if you have an immutable collection like List, you can use the sortedBy function which returns a new sorted list.
val sortedList = sides.sortedBy { it.weight }
Also, you have sortByDescending and sortedByDescending for sorting in descending order.
You're looking for sortBy. Given a list of T, sortBy takes a mapping function from T to R (where R is some type which has an ordering defined on it). Consider
Sides.sortBy { n -> n.weight }
You have two basic approaches:
Give Edge a natural ordering. Then all the sort functions will use it by default — as will anything else that can use an ordering (such as the order of keys in a SortedMap, and the binarySearch() method).
You do this by implementing the Comparable interface.  This has a single method, compareTo(), which could be as simple as:
public class Edge(val v: Int, val u: Int, val weight: Double) : Comparable<Edge> {
override fun compareTo(other: Edge) = weight.compareTo(other.weight)
}
However, that doesn't give a consistent ordering to instances which have the same weight, so you might also want to use the other properties as tie-breakers, e.g.:
override fun compareTo(other: Edge)
= weight.compareTo(other.weight).takeIf{ it != 0 }
?: v.compareTo(other.v).takeIf{ it != 0 }
?: u.compareTo(other.u)
(There are a few subtleties in implementing that, especially if you're not also overriding equals() to correspond directly. The Java documentation is worth reading.)
Note that a data class automatically implements Comparable, using the properties in its constructor, in that order. So you don't usually need to worry about ordering for that.
Provide an ordering when you sort.
Other answers discuss this. Perhaps the simplest way is:
sides.sortBy{ it.weight }
Though there are many alternatives, such as:
sides.sortWith{ a, b -> a.weight.compareTo(b.weight) }
Or you could create a Comparator instance that could be reused as necessary:
val comparator = Comparator<Edge>{ o1, o2 -> o1.weight.compareTo(o2.weight) }
sides.sortWith(comparator)
Again, a comparator can be used with many functions in the standard library, so you can avoid repeating the weight comparison code.
Which approach to choose depends on your needs.
If it makes intuitive sense for your edges to be always arranged by weight, then a natural ordering would be a good fit. That's concise to code: you just have to implement Comparable in one place (or make your class a data class with the weight property specified first), and you then get the full benefit of ordering everywhere. (Of course, you can only do that if you have control over the Edge source code.)
On the other hand, if the weight-based ordering is specific to a particular method — if you might want different orderings in other places — then it would make more sense to specify the ordering when you sort.
Of course, you can do both, if needed: you could give your object a natural ordering which applied for most things, but then specify a different ordering for particular operations.

Implementing surjective data structure?

I am interested in performing the following operations on a set of data.
First, we are given a set of keys, as an example:
vector<int> keys{1,2,3,4,5,6};
Each of these keys is understood to be pointing to a unique entry (which is not important to specify, rather what is important is the relation whether each key is pointing to a separate entry, or some keys are pointing to the same entry). Initially, we do not know whether any key is pointing to the same entry or not, so we start out with a data structure that treats all entries as separate for each key:
surjectiveData<int> data;
data.populateUnique(keys.begin(),keys.end());
Graphically, we can illustrate the current state of data as
where we use labels a,b,c,d,e,f to keep track of the unique entries in data. Now, consider adding additional information on which keys are pointing to the same entry. For example:
vector<pair<int,int>> identifications{make_pair(1,2),make_pair(3,4),make_pair(2,4),make_pair(5,6)};
data.couple(identifications.begin(),indentifications.end());
The couple method of surjectiveData goes through the pairs provided and makes them point to the same unique entry. Graphcally, the four identifications would in turn change data as follows:
and now there are only two unique entries in data, which here we denote abcd and ef. Note that once two or more keys point to the same entry, it does not matter which of these keys is identified with which of separate keys, all of them point to the same entry after identification.
Now that we are done with specifying key identifications, we could think of using data as follows. For example, we could ask what is the effective number of unique remaining entries
cout<<data.size()<<endl; // 2
Or, we could iterate through the entries and check how many keys point to each of them
for(auto it=data.begin();it!=data.end();it++){
cout<<it->size()<<" ";// 4 2
}
Ideally, internally the structure should take constant time for each identification, if possible.
I tried to search for such a data structure in the standard library, but could not find any. Did I miss it? Perhaps there is a smart way to implement it based on more basic objects? If so, what would be a minimal example for integers?
The operations you describe can be supported with a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
This is a linked data structure that supports 3 operations:
makeSet() creates a new singleton set and returns its element
union(a,b) given two elements, merges the sets that contain them. One element of each set will be the "representative" of that set
find(a) returns the representative of the set that contains a.
All operations take pretty much constant amortized time.
I usually implement this data structure in a single vector, where each array index denotes is a set element. If its value is >0, then it's a set representative and the value is the size of the set. If its value is < 0 then its value is ~p, where p is its "parent" element in the same set. Sometimes I use the 0 value for "uninitialized".
It's not hard to keep track of the number of sets.
in C++, my usual implementation would look like this:
class DijointSets {
unsigned num_sets;
std::vector<int> sets;
public:
// Create a new singleton set and return its element
unsigned make_set() {
unsigned ret = (unsigned)sets.size();
sets.push_back(1);
++num_sets;
return ret;
}
// Find the representative element of an element's set
unsigned find(unsigned x) {
int p = sets[x];
if (p>=0) {
return x;
}
p = find(~p);
sets[x] = ~p; //might be the same
return p;
}
// Merge the sets that contain two elements
// returns true if a merge was done
boolean union(unsigned a, unsigned b) {
a = find(a);
b = find(b);
if (a==b) {
return false;
}
if (sets[a] > sets[b]) {
sets[a] += sets[b]; //add sizes
sets[b] = ~(int)a;
} else {
sets[b] += sets[a]; //add sizes
sets[a] = ~(int)b;
}
--num_sets;
return true;
}
// get the size of an element's set
unsigned set_size(x) {
return sets[find(x)];
}
// get the number of sets
unsigned set_count() {
return num_sets;
}
}

Implementing ranged-loop in custom hashed set: accessing only entries not marked as empty

Container basic setup
I have implemented a simple custom unordered set container, that uses hashing. Internally, it stores data like this:
class Set
{
T *data = nullptr;
bool *emptyList = nullptr;
int size = 0;
... (inner methos go here)
};
That is, it stores two arrays. One called data with the actual values of templated type T, and another called emptyList with bool values that mark whether at that position the set is considered empty or not.
This way, linearly probing to store new values and also erasing entries become way cheaper. Both become, respectively, just a matter of finding the next the emptyList[index] = true, or of setting it to true.
Problem with the ranged for-loop
Currently, I allow iteration over the values stored in the set like for(auto i : set_instance) by having the following public member functions in the set class:
T* begin() const { data[0] };
T* end() const { data[end] };
The problem with that, of course, is that a ranged for-loop also accesses the entries in data that should not be accessed since they are marked in emptyList as being empty.
Is there a way for me to make it so that when the user tries to iterate over the set with ranged loops, only the entries in data that correspond to the entries in emptyList that are not marked as true are actually accessed/processed by the ranged loop?

How to write this java code in clojure (tips)

The question
I'm learning clojure and I love learning languages by example. But I do not like it just get a complete answer without me having to think about it.
So what I want is just some tips on what functions I might need and maybe some other clues.
The answer that I will accept is the one that gave me the necessary building blocks to create this.
public class IntervalMap<K extends Comparable<K>, V> extends
TreeMap<K, V> {
V defaultValue = null;
public IntervalMap(V defaultValue) {
super();
this.defaultValue = defaultValue;
}
/**
*
* Get the value corresponding to the given key
*
* #param key
* #return The value corresponding to the largest key 'k' so that
* " k is the largest value while being smaller than 'key' "
*/
public V getValue(K key) {
// if it is equal to a key in the map, we can already return the
// result
if (containsKey(key))
return super.get(key);
// Find largest key 'k' so that
// " k is the largest value while being smaller than 'key' "
// highest key
K k = lastKey();
while (k.compareTo(key) != -1) {
k = lowerKey(k);
if (k == null)
return defaultValue;
}
return super.get(k);
}
#Override
public V get(Object key) {
return getValue((K) key);
}
}
Update
I want to recreate the functionality of this class
For examples you can go here: Java Code Snippet: IntervalMap
I'd be looking at some combination of:
(sorted-map & key-vals) - will allow you to create a map ordered by keys. you can supply your own comparator to define the order.
(contains? coll key) - tests whether a collection holds an item identified by the argument (this is a common source of confusion when applied to vector, where contains? returns true if there is an element at the given index rather than the given value)
(drop-while pred coll) lets you skip items in a collection while a predicate is true
I would just use a map combined with a function to retrieve the closest value given a certain key. Read about maps and functions if you want to know more.
If you want to be able to mutate the data in the map, store the map in one of clojure's mutable storage facilities, for example an atom or ref. Read about mutable data if you want to know more.
You could use a function that has closed over the default value and/or the map or atom referring to a map. Read about closures if you want to know more.
The use of Protocols might come in handy here too. So, read about that too. Enough to get you going? ;-)
A few things that I used in my implementation of an interval-get function:
contains?, like #sw1nn suggested, is perfect for checking whether a map contains a particular key.
keys can be used to get all of the keys in a map.
filter keeps all of the elements in a sequence meeting some predicate.
sort, as you have have guessed, sorts a sequence.
last returns the last element in a sequence or nil if the sequence is empty.
if-let can be used to bind and act on a value if it is not falsey.
The usage of the resulting function was as follows:
(def m {0 "interval 1", 5 "interval 2"})
(interval-get m 3) ; "interval 1"
(interval-get m 5) ; "interval 2"
(interval-get m -1) ; nil
If you want to implement the code block "conceptually" in clojure then the existing answers already answer your question, but in case you want the code block to be "structurally" same in clojure (i.e the subclassing etc) then have a look at gen-class and proxy in clojure documentation.

Is there a data structure like stream, but weak?

Weak as in weak references. Basically, I need a sequence of numbers where some of them can be unallocated when they aren't needed anymore.
scalaz.EphemeralStream is what you want.
Views provide you with a lazy collection, where each value is computed as it is needed.
One thing you could do is create an Iterable instead of a Stream. Your Iterable needs to provide an iterator method, which returns an iterator with hasNext and next methods.
When you loop over the Iterable, hasNext and next will be called to generate the elements as they are needed, but they are not stored anywhere (like a Stream does).
Simple example:
class Numbers extends Iterable[Int] {
def iterator = new Iterator[Int] {
private var num = -1
def hasNext = num < 99
def next = { num += 1; num }
}
}