C++20 comparing two lazily sorted ranges

C++20 comparing two lazily sorted ranges - c++

The question
I have two ranges, call them v,w that are sorted in a given fashion and can be compared (call the order relation T). I want to compare them lexicographically but after sorting them in a different way (call this other order relation S). For this I do not really need the ranges to be completely sorted: I only need to lazily evaluate the elements on the sorted vectors until I find a difference. For example if the maximum of v in this new order is larger than the maximum of w, then I need to only look once in the ordered vectors. In the worst case that v == w I'd look up in all elements.
I understand that C++20 std::ranges::views allows me to get a read only view of v and w that is lazily evaluated. Is it possible to get a custom sorted view that is still lazily evaluated? if I were able to define some pseudocode like
auto v_view_sorted_S = v | std::views::lazily_sort();
auto w_view_sorted_S = w | std::views::lazily_sort();
Then I could simply call std::ranges::lexicographical_compare(v_view_sorted_S, w_view_sorted_S).
How does one implement this?
Would simply calling std::ranges::sort(std::views::all(v)) work? in the sense that will it accept a view instead of an actual range and more importantly evaluate the view lazily? I get from the comments to the reply in this question that with some conditions std::ranges::sort can be applied to views, and even transformed ones. But I suspect that it sorts them at the call time, is that the case?
The case I want it used:
I am interested in any example but the very particular use case that I have is the following. It is irrelevant for the question, but helps putting this in context
The structures v and w are of the form
std::array<std::vector<unsigned int>,N> v;
Where N is a compile-time constant. Moreover, for each 0 <= i < N, v[i] is guaranteed to be non-increasing. The lexicographical order thus obtained for any two ordered arrays is what I called T above.
What I am interested is in comparing them by the following rule: given an entry a = v[i][j] and b = v[k][l] with 0 <= i,k < N and j,l >= 0. Then declare a > b if that relation holds as unsigned integers or if a == b as unsigned integers and i < k.
After ordering all entries of v and w with respect to this order, then I want to compare them lexicographically.
Example, if v = {{2,1,1}, {}, {3,1}}, w = {{2,1,0}, {2}, {3,0}} and z = {{2,1,0}, {3}, {2,0}}, then z > w > v.

Related

Bring partially ordered list items into a total order

Think of some type S that implements <= and == (and what you can build from that). The order given by S::operator<= is no total order, i.e., it may happen that you find x, y of type S such that neither x <= y nor y <= x.
Given a vector<S> v, I want to bring elements of v in some order such that v[i] < v[j] implies i < j.
To achieve this, I could choose a fixed total order on S that refines <=. But that takes no advantage of the possibility that the elements of v are probably already (almost) in in the correct order w.r.t. some other total order on S.
So, how would bring v in an order satisfying my requirements with least effort; at best, using stl algorithms?

Complexity of searching in a set of sets (C++)

I have a set of sets of positive integers std::set<set::<int> > X. Now I am given a set std::set<int> V and I want to know if it occurs in X. Obviously, this can be done by invoking the function find, so X.find(V) != X.end() should return true if V is in X.
My question is about the complexity of this operation, i.e. if X contains n sets of positive integers, what is time complexity of X.find(V)?

Searching in a set is O(log n) in the number of elements, regardless of what the elements are composed of, even other sets. If the element is another set all you need is an ordering predicate (using the address of the object is a safe default). However, searching for an integer nested in the set of sets is going to be O(m log n) in general.

Suppose there are e sets in X such that the summation of sizes of all e sets is n, i.e., |S1| + |S2| + ... + |Se| = n then in the worst case X.find(V) will take O(m*log(e)) where m is the size of V, i.e., |V| = m. As you can see it is independent of n.
Why? So a set in STL is typically implemented as a self-balancing binary search tree. Therefore the height of root is always O(log(e)) where e is the total number of elements in the tree currently. Now notice that in our case the nodes of the tree are sets. set by default use less than < operator to compare with other set of the same type which takes O(min(|S1|, |S2|)) time to compare.
Therefore in the worst case, if the set V we want to find is one of the leaves of X and all the nodes on the branch from the root to V have size >= |V| then every node comparison will take O(|V|) time and since there are O(log(e)) nodes on this branch, it'll take us O(m*log(e)) time.

C++ Array Intersection

Does anyone know if it's possible to turn this from O(m * n) to O(m + n)?
vector<int> theFirst;
vector<int> theSecond;
vector<int> theMatch;
theFirst.push_back( -2147483648 );
theFirst.push_back(2);
theFirst.push_back(44);
theFirst.push_back(1);
theFirst.push_back(22);
theFirst.push_back(1);
theSecond.push_back(1);
theSecond.push_back( -2147483648 );
theSecond.push_back(3);
theSecond.push_back(44);
theSecond.push_back(32);
theSecond.push_back(1);
for( int i = 0; i < theFirst.size(); i++ )
{
for( int x = 0; x < theSecond.size(); x++ )
{
if( theFirst[i] == theSecond[x] )
{
theMatch.push_back( theFirst[i] );
}
}
}

Put the contents of the first vector into a hash set, such as std::unordered_set. That is O(m). Scan the second vector, checking if the values are in the unordered_set and keeping a tally of those that are. That is n lookups of a hash structure, so O(n). So, O(m+n). If you have l elements in the overlap, you may count O(l) for adding them to the third vector. std::unordered_set is in the C++0x draft and available in the latest gcc versions, and there is also an implementation in boost.
Edited to use unordered_set
Using C++2011 syntax:
unordered_set<int> firstMap(theFirst.begin(), theFirst.end());
for (const int& i : theSecond) {
if (firstMap.find(i)!=firstMap.end()) {
cout << "Duplicate: " << i << endl;
theMatch.push_back(i);
}
}
Now, the question still remains, what do you want to do with duplicates in the originals? Explicitly, how many times should 1 be in theMatch, 1, 2 or 4 times?
This outputs:
Duplicate: 1
Duplicate: -2147483648
Duplicate: 44
Duplicate: 1

Using this: http://www.cplusplus.com/reference/algorithm/set_intersection/
You should be able to achieve O(mlogm + nlogn) I believe. (set_intersection requires that the input ranges be already sorted).
This might perform a bit differently than your solution for duplicate elements, however.

Please correct me if I am wrong,
you are suggesting following solution for the intersection problem:
sort two vectors, and keep iteration in both sorted vector in such a way that we reach to a common element,
so overall complexity will be
(n*log(n) + m*log(m)) + (n + m)
Assuming k*log(k) as complexity of sorting
Am I right?
Ofcourse the complexity will depend on the complexity of sorting.

I would sort the longer array O(n*log (n)), search for elements from the shorter array O(m*log (n)). Total is then O(n*log(n) + m*log (n) )

Assuming you want to produce theMatch from two data sets, and you don't care about the data sets themselves, put one in an unordered_map (available currently from Boost and listed in the final committee draft for C++11), mapping the key to an integer that increases whenever added to, and therefore keeps track of the number of times the key occurs. Then, when you get a hit on the other data set, you push_back the hit the number of times it occurred in the first time.
You can get to O(n log n + m log m) by sorting the vectors first, or O(n log n + m) by creating a std::map of one of them.
Caveat: these are not order-preserving operations, and theMatch will come out in different orders with different techniques. It looks to me like the order is likely considered arbitrary. If the order given in the code above is necessary, I don't think there's a better algorithm.
Edit:
Take data set A and data set B, of type Type. Create an unordered_map<Type, int>.
Go through data set A, and check each member to see if it's in the map. If not, add the element with the int 1 to the map. If it is, increment the int. Each of these operations is O(1) on the average, so this step is O(len A).
Go through data set B, and check each member to see if it's in the map. If not, go on to the next. If so, push_back the member onto the destination queue. The int is the number of times that value is in data set A, so do the push_back the number of times the member's in A to duplicate the behavior given. Each of these operations is on the average O(1), so this step is O(len B).
This is average behavior. If you always hit the worst case, you're back with O(m*n). I don't think there's a way to guarantee O(m + n).

If the order of the elements in the resulting array/set doesn't matter then the answer is yes.
For the arbitrary types of elements with some order defined the best algorithm is O( max(m,n)*log(min(m,n)) ). For the numbers of limited size the best algorithm is O(m+n).
Construct the set of elements of smaller array - for arbitrary elements just sorting is OK and for the numbers of limited size it must be something similar to intermediate table in numeric sort.
Iterate through larger array and check if the element is within a set constructed earlier - for the arbitrary element binary search is OK (which is O(log(min(n,m))) and for numbers the single check is O(1).

Find two elements in an array that sum to k [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Given two arrays a and b .Find all pairs of elements (a1,b1) such that a1 belongs to Array A and b1 belongs to Array B whose sum a1+b1 = k .
Given : An unsorted array A of integers
Input : An integer k
Output : All the two element set with sum of elements in each set equal to k in O(n).
Example:
A = {3,4,5,1,4,2}
Input : 6
Output : {3,3}, {5,1}, {4,2}
Note : I know an O(n logn) solution but that would require to have the array sorted. Is there any way by which this problem can be solved in O(n). An non-trivial C++ data structure can be used i.e there's no bound on space

Make a constant-time lookup table (hash) so you can see if a particular integer is included in your array (O(n)). Then, for each element in the array, see if k-A[i] is included. This takes constant time for each element, so a total of O(n) time. This assumes the elements are distinct; it is not difficult to make it work with repeating elements.

Just a simple algorithm off the top of my head:
Create a bitfield that represents the numbers from 0 to k, labeled B
For each number i in A
Set B[i]
If B[k-i] is set, add (i, k-i) to the output
Now as people have raised, if you need to have two instances of the number 3 to output (3, 3) then you just switch the order of the last two statements in the above algorithm.
Also I'm sure that there's a name for this algorithm, or at least some better one, so if anyone knows I'd be appreciative of a comment.

http://codepad.org/QR9ptUwR
This will print all pairs. The algorithm is same as told by #bdares above.
I have used stl maps as we dont have hash tables in STL.

One can reduce the,
Element Uniqueness bit,
to this. No O(n).

There are k pairs of integers that sum to k: {0,k}, {1,k-1}, ... etc. Create an array B of size k+1 where elements are boolean. For each element e of the array A, if e <= k && B[e] == false, set B[e] = true and if B[k-e] == true, emit the pair {e,k-e}. Needs to be extended slightly for negative integers.

Comparing two vectors of maps

I've got two ways of fetching a bunch of data. The data is stored in a sorted vector<map<string, int> >.
I want to identify whether there are inconsistencies between the two vectors.
What I'm currently doing (pseudo-code):
for i in 0... min(length(vector1), length(vector2)):
for (k, v) in vector1[i]:
if v != vector2[i][k]:
// report that k is bad for index i,
// with vector1 having v, vector2 having vector2[i][k]
for i in 0... min(length(vector1), length(vector2)):
for (k, v) in vector2[i]:
if v != vector1[i][k]:
// report that k is bad for index i,
// with vector2 having v, vector1 having vector1[i][k]
This works in general, but breaks horribly if vector1 has a, b, c, d and vector2 has a, b, b1, c, d (it reports brokenness for b1, c, and d). I'm after an algorithm that tells me that there's an extra entry in vector2 compared to vector1.
I think I want to do something where when I encountered mismatches entries, I look at the next entries in the second vector, and if a match is found before the end of the second vector, store the index i of the entry found in the second vector, and move to matching the next entry in the first vector, beginning with vector2[i+1].
Is there a neater way of doing this? Some standard algorithm that I've not come across?
I'm working in C++, so C++ solutions are welcome, but solutions in any language or pseudo-code would also be great.
Example
Given the arbitrary map objects: a, b, c, d, e, f and g;
With vector1: a, b, d, e, f
and vector2: a, c, e, f
I want an algorithm that tells me either:
Extra b at index 1 of vector1, and vector2's c != vector1's d.
or (I'd view this as an effectively equivalent outcome)
vector1's b != vector2's c and extra d at index 2 of vector1
Edit
I ended up using std::set_difference, and then doing some matching on the diffs from both sets to work out which entries were similar but different, and which had entries completely absent from the other vector.

Something like the std::mismatch algorithm
You could also use std::set_difference

It sounds like you're looking for the diff algorithm. The idea is to identify the longest common subsequence of the two vectors (using map equality), then recurse down the non-common portions. Eventually you'll have an alternating list of vector sub-sequences that are identical, and sub-sequences that have no common elements. You can then easily produce whatever output you like from this.
Apply it to the two vectors, and there you go.
Note that since map comparison is expensive, if you can hash the maps (use a strong hash - collisions will result in incorrect output) and use the hashes for comparisons you'll save a lot of time.
Once you're down to the mismatched subsequences at the end, you'll have something like:
Input vectors: a b c d e f, a b c' d e f
Output:
COMMON a b
LEFT c
RIGHT c'
COMMON d e f
You can then individually compare the maps c and c' to figure out how they differ.
If you have a mutation and insertion next to each other, it gets more complex:
Input vectors: a b V W d e f, a b X Y d e f
Output:
COMMON a b
LEFT V W
RIGHT X Y
COMMON d e f
Determining whether to match V and W against X or Y (or not at all) is something you'll have to come up with a heuristic for.
Of course, if you don't care about how the content of the maps differ, then you can stop here, and you have the output you want.

What exactly are you trying to achieve? Could you please define precisely what output you expect in terms of the input? Your pseudo code compares maps at the vector index. If that is not the correct semantics, then what is?

Can you associate with each map some kind of checksum (or Blumen filter) - that at single check you could be able to decide if comparison has a sense.

In your example, note that is not possible to differentiate between
Extra b at index 1 of vector1, and
vector2's c != vector1's d.
and
Extra b at index 1 of vector 1, extra
d at index 2 of v1, and extra c at 1
in v2
because it is not clear that "c" shoud be compared to "d", it could be compared to "b" either. I assume the vectors are not sorted, because std::map doesn't provide a relational operator. Rather are the maps, which is as far as I see completly irrelevant ;-)
So your example is slightly misreading. It could even be
Compare
b f e a d
with
a c f e
You can check each element of the first vector against each element of the second vector.
This has quadratic runtime.
for i in 0... length(vector1):
foundmatch = false;
for j in 0... length(vector2):
mismatch = false;
for (k, v) in vector1[i]:
if v != vector2[j][k]:
mismatch = true;
break; // no need to compare against the remaining keys.
if (!mismatch) // found matching element j in vector2 for element i in vector1
foundmatch = true;
break; // no need to compare against the remaining elements in vector2
if (foundmatch)
continue;
else
// report that vector1[i] has no matching element in vector2[]
// "extra b at i"
If you want the find the missing elements, just swap vector1 and vector2.
If you want to check in a element in vector2 mismatches to a element in vector1 in only a single key, you have to add additional code around "no need to compare against the remainig keys".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js