Arg-min using a stream of ints

Arg-min using a stream of ints - concurrency

I would like to find, using Java streams, an integer i, in the range {1,...1000}, for which the function sin(i/100) is smallest. I tried to use min with comparator, as in this question:
Comparator<Integer> sineComparator = (i,j) ->
Double.compare(Math.sin(i/100.0), Math.sin(j/100.0));
IntStream.range(1,1000)
.min(sineComparator);
but it did not work since an IntStream does not have a variant of min that accepts a comparator. What can I do?

You have to use boxed() to convert IntStream to Stream<Integer> which will allow you to use min with a comparator:
IntStream.range(1, 1000)
.boxed()
.min(sineComparator)
Alternatively, you can avoid boxing but at the cost of reduced clarity:
IntStream.range(1,1000)
.reduce((i,j) -> sineComparator.compare(i, j) <= 0 ? i : j)
On a separate note, you can create your comparator using Comparator.comparingDouble:
Comparator<Integer> sineComparator = Comparator.comparingDouble(i -> Math.sin(i/100.0));

Related

Finding pair values in a given range

I have an array or N pairs (v1, v2) where v1 <= v2. These are supposed to represent events in time that start at v1 and end at v2. they can be equal, then the event is instantaneous. The array is sorted by starting time, v1.
For a given range (L, R), I would like to find any pair where L <= v1 <= R or L <= v2 <= R. The idea here is to get events starting, happening or ending in the given range.
My main problem is efficiency. The array could contains hundreds of thousands of events. So just a linear search going through all pairs is not an option.
I read a bit about kd-tree but the problem with it is that it excludes the boundaries of the range and would only return L <= v1 <= R AND L <= v2 <= R. That is, would only return events that actually happen (start AND end) in the range whereas I need start OR end (or both obviously).
I thought also about keeping 2 lookup tables (I use double for time)
std::map<double, Event*> startPoints;
std::map<double, Event*> endPoints;
and the use the std::find algorithm in both of them and merge the results.
Just looking for an advise, wether it's a good solution or if there is a more clever way.
EDIT:
Re-thinking about that, It is more complicated. Here is an example of the expected results
L < R : Range is large enough
|---Ev1---| |---Ev3---| |---Ev5---|
|---Ev2---| |---Ev4---|
| |
L R
Here I would like to get Ev2 (which is ending in the range), Ev3 (Which is happening in the range and Ev4 (which is starting in the rage)
L < R: Range too small for a complete event
|---Ev1---| |---Ev3---| |---Ev5---|
|---Ev2---| |---Ev4---|
| |
L R
Here I would like to get Ev3 as it is currently running in the range and Ev4 as it is starting in the range
L == R: If I want to know what happens at one point in time
|---Ev1---| |---Ev3---| |---Ev5---|
|---Ev2---| |---Ev4---|
|
LR
Here I would like only Ev2 as it is the only one currently running.

As you need to handle three cases - starting, happening or ending in the given range, we can split it into three parts.
starting: v1 lies in [L,R].
ending: v2 lies in [L,R].
The third case can be formulated as v1 <= R and L <= v2, but the first two cases partially cover this case, so we will use different formulation to avoid collisions:
happening: v1 < L and R < v2
Well, it is easy to handle the first case in logarithmic plus number of reported events time if we can sort the array of events by v1. The same trick works for the second case.
The third case is trickier. Let's draw:
The pink area represents all intervals L <= R. The red dot is an interval and greenish area represents all possible events we want to capture. To do such a capture one can use k2-tree.

Using an indexed approach is fine - such as Boost.ICL solution.
That being said you could easily use a std::vector for this - even unsorted - I think for as long as you are within the range of some 100.000 or even 1.000.000 you should be fine (as long as you store actual values - not pointers in the vector as that can be slow) - exact numbers will of course depend on your thressholds.
struct MyEvent {
double v1;//you use double for time
double v2;
};
std::vector<MyEvent> events;
Here is an example using 1.000.000 elements:
http://coliru.stacked-crooked.com/a/9a6d90348f6915e1
and the searching runs in 42 ms which consists of one compare and optional copy while your case may be a bit different it is comparable.
Going further, you could get more power by parallelizing your search in some way using eg. std::for_each.

std::map -->finding element complexity is O(logn)
If your keys are unique and you don't have a memory problem, you can use std::unordered_map which complexity is amortized (O1).
Also, you don't need to create 2 maps.
std::unordered_map<double, std::pair<Event*, Event*>> StartEndPoints;.
If your keys don't unique you can use std::unordered_multimap, but if your keys will be repeated a lot, the finding complexity could become (On).
I'll suggest not to pass key type as a double.
std::hash<double> hashing.
auto temp = hashing(key). // decltype of temp will be size_t
std::unordered_map<std::size_t, std::pair<Event*, Event*>> StartEndPoints;

Is there a way to use an enum for characters in a string? C++

This was taken off LeetCode but basically given a string composed of a few unique characters that each have an associated integer value, I need to quickly process the total integer value of the string. I thought enums would be useful since you know what is going to compose your strings.
The enum is the types of characters that can be in my string (can see that it's limited). If a character with a smaller value is before a character with a bigger value, like IV, then I subtract the preceding character's value from the one after it. Otherwise you add. The code is my attempt, but I can't get enums to work with my algorithm...
std::string s = "III";
int sum = 0;
enum {I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000};
// O(n) iteration.
for (int i = 0; i < s.length(); i++) {
// Must subtract.
if (s[i] < s[i+1]) {
sum += s[i+1] - s[i];
}
// Add.
else {
sum += s[i];
}
}
std::cout << "sum is: " << sum;
My questions then are 1) Is using enum with a string possible? 2) I know it's possible to do with a unordered_map but I think enums is much quicker.

If you won't mind minor memory overhead, you can do something like this:
int table[256];
table['I']=1;
table['V']=5;
...
and then
sum += table[s[i]];
and so on. This approach is guaranteed to be O(1), which is basically the fastest solution you able to get. You can also use std::array instead of POD array, encapsulate all this in some class and add assertions, but this is the idea.

2) I know it's possible to do with a unordered_map but I think enums
is much quicker.
you're comparing oranges with apples.
first, enum is not a container. it's basically just like a list of known constants.
when you mean the access time of operator[]:
for unordered_map:
Unordered map is an associative container that contains key-value
pairs with unique keys. Search, insertion, and removal of elements
have average constant-time complexity.
for string it's also constant time access.
1) Is using enum with a string possible
No. An enum key is basically like an "alias" for the value. Note that each string is a sequence of characters:
V != "V"

It is not possible to convert a char or a string to an enum without some kind of mapping. Because the compiler replaces the enum with its underlying value during compilation. So you cannot dynamically access the enum with its name stored in a string.
You have to use either any one of map family or if else construct to achieve your need.

Using std::find_if with std::vector to find smallest element greater than some value

I have a vector of doubles which are ordered.
std::vector<double> vec;
for(double x{0.0}; x < 10.0 + 0.5; x += 1.0) vec.push_back(x);
I am trying to use std::find_if to find the first element, and its corresponding index, of this vector for which y < element. Where y is some value.
For example, if y = 5.5 then the relevant element is element 6.0 which has index 6, and is the 7th element.
I suspect that a lambda function could be used do to this.
Can a lambda function be used to do this?
If so, how do I implement a find_if statement to do what I want?

1 line of code is necessary, if a lambda function is used: x is a local double (value we are searching for) const double x = 5.5
std::find_if(vec.cbegin(), vec.cend(), [x] (double element) { return (x < element); })
Breaking this down, we have:
std::find_if(vec.cbegin(), vec.cend(), lambda)
The lambda is:
[x] (double element) { return (x < element); }
This means:
capture the local variable x for use inside the lambda body
the lambda function takes a single double as an argument (the algorithm find_if will pass each element as it iterates over the vector to this function as element)
the function body returns true when the first element is found for which x < element
Note: I answered this question myself while researching possible solutions. The context in which I am using this in my own program is slightly different. In my context, the vector is a specification for numerical boundries, and I am looking to find lower < x < higher. Therefore the code I used is slightly different to this, in that I had to shift my returned iterator by 1 place because of how the numerical boundries are specified. If I introduced a bug in transcribing the solution, let me know.

Just use std::upper_bound() - it is more effective (it is using binary search) and does not need a lambda:
auto it = std::upper_bound( vec.begin(), vec.end(), x );
if you need to find lower < x < upper you can use std::equal_range(), but you would need additional logic to find proper lower as std::lower_bound will give element which less or equal to x, so you need to make sure lower is less than x and not equal.
live example

Looks like you haven't been introduced to std::upper_bound yet:
std::upper_bound(vec.begin(), vec.end(), x);

If you need the index, you can add a counter.
Something like
x = 5;
cnt = -1;
std::find_if(vec.cbegin(), vec.cend(),
[&] (double element) { ++cnt; return (x < element); })

Java list: get amount of Pairs with pairwise different Keys using lambda expressions

I have a list of key-value-pairs and I want to filter a list where every key parameter only occurs once.
So that a list of e.g. {Pair(1,2), Pair(1,4), Pair(2,2)} becomes {Pair(1,2), Pair(2,2)}.
It doesn't matter which Pair gets filtered out as I only need the size
(maybe there's a different way to get the amount of pairs with pairwise different key values?).
This all is again happening in another stream of an array of lists (of key-value-pairs) and they're all added up.
I basically want the amount of collisions in a hashmap.
I hope you understand what I mean; if not please ask.
public int collisions() {
return Stream.of(t)
.filter(l -> l.size() > 1)
.filter(/*Convert l to list of Pairs with pairwise different Keys*/)
.mapToInt(l -> l.size() - 1)
.sum();
}
EDIT:
public int collisions() {
return Stream.of(t)
.forEach(currentList = stream().distinct().collect(Collectors.toList())) //Compiler Error, how do I do this?
.filter(l -> l.size() > 1)
.mapToInt(l -> l.size() - 1)
.sum();
}
I overwrote the equals of Pair to return true if the Keys are identical so now i can use distinct to remove "duplicates" (Pairs with equal Keys).
Is it possible to, in forEach, replace the currentElement with the same List "distincted"? If so, how?
Regards,
Claas M

I'm not sure whether you want the sum of amount of collisions per list or the amount of collisions in all list were merged into a single one before. I assumed the former, but if it's the latter the idea does not change by much.
This how you could do it with Streams:
int collisions = Stream.of(lists)
.flatMap(List::stream)
.mapToInt(l -> l.size() - (int) l.stream().map(p -> p.k).distinct().count())
.sum();
Stream.of(lists) will give you a Stream<List<List<Pair<Integer, Integer>> with a single element.
Then you flatMap it, so that you have a Stream<List<Pair<Integer, Integer>>.
From there, you mapToInt each list by substracting its original size with the number of elements of unique Pairs by key it contained (l.stream().map(p -> p.k).distinct().count()).
Finally, you call sum to have the overall amount of collisions.
Note that you could use mapToLong to get rid of the cast but then collisions has to be a long (which is maybe more correct if each list has a lot of "collisions").
For example given the input:
List<Pair<Integer, Integer>> l1 = Arrays.asList(new Pair<>(1,2), new Pair<>(1,4), new Pair<>(2,2));
List<Pair<Integer, Integer>> l2 = Arrays.asList(new Pair<>(2,2), new Pair<>(1,4), new Pair<>(2,2));
List<Pair<Integer, Integer>> l3 = Arrays.asList(new Pair<>(3,2), new Pair<>(3,4), new Pair<>(3,2));
List<List<Pair<Integer, Integer>>> lists = Arrays.asList(l1, l2, l3);
It will output 4 as you have 1 collision in the first list, 1 in the second and 2 in the third.

Don't use a stream. Dump the list into a SortedSet with a custom comparator and diff the sizes:
List<Pair<K, V>> list; // given this
Set<Pair<K, V>> set = new TreeSet<>(list, (a, b) -> a.getKey().compareTo(b.getKey())).size();
set.addAll(list);
int collisions = list.size() - set.size();
If the key type isn't comparable, alter the comparator lambda accordingly.

Fastest way to check list of integers against a list of Ranges in scala?

I have a list of integers and I need to find out the range it falls in. I have a list of ranges which might be of size 2 to 15 at the maximum. Currently for every integer,I check through the list of ranges and find its location. But this takes a lot of time as the list of integers I needed to check includes few thousands.
//list of integers
val numList : List[(Int,Int)] = List((1,4),(6,20),(8,15),(9,15),(23,27),(21,25))
//list of ranges
val rangesList:List[(Int,Int)] = List((1,5),(5,10),(15,30))
def checkRegions(numPos:(Int,Int),posList:List[(Int,Int)]){
val loop = new Breaks()
loop.breakable {
for (va <- 0 until posList.length) {
if (numPos._1 >= posList(va)._1 && numPos._2 <= (posList(va)._2)) {
//i save "va"
loop.break()
}
}
}
}
Currently for every integer in numList I go through the rangesList to find its range and save its range location. Is there any faster/better way approach to this issue?
Update: It's actually a list of tuples that is compared against a list of ranges.

First of all, using apply on a List is problematic, since it takes linear run time.
List(1,2,3)(2) has to traverse the whole list to finally get the last element at index 2.
If you want your code to be efficient, you should either find a way around it or choose another data structure. Data structures like IndexedSeq have constant time indexing.
You should also avoid breaks, as to my knowledge, it works via exceptions and that is not a good practice. There are always ways around it.
You can do something like this:
val numList : List[(Int,Int)] = List((1,4),(6,20),(8,15),(9,15),(23,27),(21,25))
val rangeList:List[(Int,Int)] = List((1,5),(5,10),(15,30))
def getRegions(numList: List[(Int,Int)], rangeList:List[(Int,Int)]) = {
val indexedRangeList = rangeList.zipWithIndex
numList.map{case (a,b) => indexedRangeList
.find{case ((min, max), index) => a >= min && b <= max}.fold(-1)(_._2)}
}
And use it like this:
getRegions(numList, rangeList)
//yields List(0, -1, -1, -1, 2, 2)
I chose to yield -1 when no range matches. The key thing is that you zip the ranges with the index beforehand. Therefore we know at each range, what index this range has and are never using apply.
If you use this method to get the indices to again access the ranges in rangeList via apply, you should consider changing to IndexedSeq.
The apply will of course only be costly when the number ranges gets big. If, as you mentioned, it is only 2-15, then it is no problem. I just want to give you the general idea.

One approach includes the use of parallel collections with par, and also indexWhere which delivers the index of the first item in a collection that holds a condition.
For readability consider this predicate for checking interval inclusion,
def isIn( n: (Int,Int), r: (Int,Int) ) = (r._1 <= n._1 && n._2 <= r._2)
Thus,
val indexes = numList.par.map {n => rangesList.indexWhere(r => isIn(n,r))}
indexes: ParVector(0, -1, -1, -1, 2, 2)
delivers, for each number, the index in the ranges collection where it is included. Value -1 indicates the condition did not hold.
For associating numbers with range indexes, consider this,
numList zip indexes
res: List(((1,4), 0), ((6,20),-1), ((8,15),-1),
((9,15),-1), ((23,27),2), ((21,25),2))
Parallel collections may prove more efficient that the non parallel counterpart for performing computations on a very large number of items.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Arg-min using a stream of ints - concurrency

Related

Finding pair values in a given range

Is there a way to use an enum for characters in a string? C++

Using std::find_if with std::vector to find smallest element greater than some value

Java list: get amount of Pairs with pairwise different Keys using lambda expressions

Fastest way to check list of integers against a list of Ranges in scala?

Categories

Resources