Find common elements in two Integer List - list

I have two given integer List:
alist : TList<Integer>; // eg. 1,2,3,4,5,6,7,8,9
blist : TList<Integer>; // e.g 1,2,3,4,5
Resultlist : TList<Integer>;
IgnoreList : TList<Integer>; // e,g, 1,2,3
What is a effective way to find the common elements on both lists, excluding elements from the ignore list. As I have to run this procedure over many items I need a effective and fast way of implementation for this problem.
Resultlist should be 4,5

I agree with Dmitry. Converting lists to hash sets and looking up in them would be fast irrespective of whether the lists are sorted.
Have a look at Delphi's TDictionary. TDictionary intersection is one quick way of finding common elements. Otherwise,
1) Create a TDictionary for blacklisted elements.
2) Create a TDictionary and insert elements from alist that are not present in blacklist-dictionary. This operation is fast because TDictionary are optimized for lookup.
3) finally, iterate over elements of blist and only output elements preent in alist-dictionary.

What you want for this is the list comparison algorithm. You take two sorted lists (make sure to sort them first if they aren't already) and two index variables, set both indices to 0, and start comparing values, advancing one index or the other if they aren't equal, or both indices of they are. By customizing the behavior of the algorithm when unequal or equal values are found, there are a lot of things you can do with this. What you want for here is, when equal values are found, insert the value into the output list.

Related

Implementation of non increasing list with STL C++

In a problem inputs are several lists of numbers,
Ex-
(1,5,4,3), (2,7,3,1,5), (1,9,1,7,3,7,2), (3,5,4,2,3).
where each list may appear twice.
In the final output distinct lists should be printed, in which elements in each list should be sorted in non increasing order, as well as lists should be sorted like that.
Is it possible to implement this whole thing with map in c++ ?
Output for the above example should be
9,7,7,3,2,1,1
7,5,3,2,1
5,4,3,3,2
5,4,3,1
Simply,set of unique lists where within each lists numbers again sorted in non increasing order.
std::set will definitely help you. You will get a unique sorted list as the resultant set if you insert your list into a std::set<int>
Edit:
std::set<std::multiset<int, std::greater<int>> myList;
Inner set going to sort in non increasing order and keeping duplicates elements, and outer set going to keep only unique list of inner list.

Fastest way to compare 2 c++ std::lists, changing in each iteration

Let's say i have 2 std::lists, each contains various numbers of elements. Every element (on each list) has UNIQUE int id value. On every iteration i need to remove elements from first list that don't appear on the second one and add elements from the second list that don't belong to the first one. E.g (numbers=unique ids):
iteration: first[3,2,1], second[4,3,5,7,6], result: [3,4,5,6,7]
iteration: first[3,4,5,6,7], second[4,10,9], result: [4,10,9]
etc...
I cannot simply swap second one to first one (let us recognise it's impossible, too long to read). My question is:
What is the best search algorithm I can perform to update first list? I mean, should i use nested loops on both sorted lists and compare ids? Remove continuously elements from first lacking in the second but also delete repeating ones in first. Then merge it? Or maybe make one of them unordered_map(hash table)?
Edited:
I wanted to simplify problem but in fact, it's unclear now. I cannot change containers, there are 2 unsorted lists contain 2 different structures each. The only link between 2 types of structures is an id parameter. In every iteration i have to check if first list looks just like the second one. Ids are unique, no repeats allowed. So if ids match lists will be identical. I can't swap them because first list has e.g 30 values and the second one 10 (it's incompleted). There are another special functions to prepare structure for first list that consist of many different structures (including structure from list 2). These functions are launched only if there are ids from the second list that don't appear in the first list. I mustn't manipulate first list but i'm able to modify the second one.
I tried in this ways. In every iteration:
1. Create a std::unordered_set with hashed ids from second list. Then compare it to first list and remove outdated ids from first list. Remove also repeating ids from unordered_set. We'll end up with the set of new structures from list 2. We can run another functions and then add suitable ones to first list.
2. Sort list2 by ids. Then do binary search.
3. Linear search. 2 loops. Id that appears in first list and doesn't in the second one is removed from first list. Id that appears in both lists is removed from the second list. And finally we got ids that appear in second list and don't in the second one. We can process them and merge with list 1.
The most important thing: There will be a lot of comparisons but lists are the same most of the time!
The most efficient way to do this is probably going to be to simply assign the second list to the first:
first = second;
which will copy all the elements in second and put them in first.
If for some reason you need to keep the elements in place, you can sort each list and use the set_difference algorithm to find all the elements in one list that are not in the other list.

Using sorting algorithms on a Queue?

I'm using Queue Abstract Data Type which is based on Singly Linked List. I want to sort the data which Queue keeps in 3 ways: First with merge sort, second with quick sort, third with heap sort. So is there anyone can help about this?
Ordinarily a queue is sorted by insertion order - items are sorted by the order in which they were inserted into the queue. It appears you want to break that essential quality of a queue.
I'm only going to cover merge sorting with this answer. Hopefully others will cover the other algorithms or you can derive them yourself.
A single linked list can be treated as a list of lists simply by knowing when one list ends and another begins. For a merge sort you need to start with sorted lists - if each list has a length of 1, it is sorted simply because no other order is possible. Merging two linked lists into one is easy - you take the smallest item from each of two lists and link it into a new list, until both lists are exhausted. So for the first pass, you break the list into sublists of length 1, and combine them into sublists of length 2. The second pass you merge the sublists of length 2 into sublists of length 4. Each pass doubles the size of the sorted sublists. You're finished when the size of the sorted sublist is greater or equal to the size of your entire list.

What is the fastest way to return x,y coordinates that are present in both list A and list B?

I have two lists (list A and list B) of x,y coordinates where 0 < x < 4000, 0 < y < 4000, and they will always be integers. I need to know what coordinates are in both lists. What would be your suggestion for how to approach this?
I have been thinking about representing the lists as two grids of bits and doing bitwise & possibly?
List A has about 1000 entries and changes maybe once every 10,000 requests. List B will vary wildly in length and will be different on every run through.
EDIT: I should mention that no coordinate will be in lists twice; 1,1 cannot be in list A more than once for example.
Represent (x,y) as a single 24 bit number as described in the comments.
Maintain A in numerical order (you said it doesn't vary much, so this should be hardly any cost).
For each B do a binary search on the list. Since A is about 1000 items big, you'll need at most 10 integer comparisons (in the worst case) to check for membership.
If you have a bit more memory (about 2MB) to play with you could create a bit-vector to support all possible 24 bit numbers then then perform a single bit operation per item to test for membership. So A would be represented by a single 2^24 bit number with a bit-set if the value is there (otherwise 0). To test for membership you would just use an appropriate bit and operation.
Put the coordinates of list A into some kind of a set (probably a hash, bst, or heap), then you can quickly see if the coordinate from list B is present.
Depending on whether you're expecting the list to be present or not present in the list would determine what underlying data structure you use.
Hashes are good at telling you if something is in it, though depending on how it's implemented, could behave poorly when trying to find something that isn't in it.
bst and heaps are equally good at telling you if something is in it or not, but don't perform theoretically as well as hashes when something is in it.
Since A is rather static you may consider building a query structure and check of all elements in B whether they occur in A. One example would be an std::set > A and you can query like A.find(element_from_b) != A.end() ...
So the running time in total is worst case O(b log a) (where b is the number of elements in B, and a respectively). Note also that since a is always about 10000, log a basically is constant.
Define an ordering based on their lexicographic order (sort first on x then on y). Sort both lists based on that ordering in O(n log n) time where n is the larger of the number of elements of each list. Set a pointer to the first elment of each list and advance the one that points to the lesser element; when the pointers reference to elements with the same value, put them into a set (to avoid multiplicities within each list). This last part can be done in O(n) time (or O(m log m) where m is the number of elements common to both lists).
Update (based on comment below and edit above): Since no point appears more than once in each list, then you can use a list or vector or dequeue to hold the points common to both or some other (amortized) constant time insertion realizing the O(n) time performance regardless of the number of common elements.
This is easy if you implement an STL predicate which orders two pairs (i.e. return (R.x < L.x || (R.x==L.x && R.y < L.y). You can then call std::list::sort to order them, and std::set_intersection to find the common elements. No need to write the algoritms
This is the kind of problem that just screams "Bloom Filter" at me.
If I understand correctly, you want the common coordinates in X and Y -- the intersection of (sets) Listing A and B? If you are using STL:
#include <vector>
#include <std>
using namespace std;
// ...
set<int> a; // x coord (assumed populated elsewhere)
set<int> b; // y coord (assumed populated elsewhere)
set<int> in; // intersection
// ...
set_intersection(a.begin(), a.end(), b.begin(), b.end(), insert_iterator<set<int> >(in,in.begin()));
I think hashing is your best bet.
//Psuedocode:
INPUT: two lists, each with (x,y) coordinates
find the list that's longer, call it A
hash each element in A
go to the other list, call it B
hash each element in B and look it up in the table.
if there's a match, return/store (x,y) somewhere
repeat #4 till the end
Assuming length of A is m and B's length is n, run time is O(m + n) --> O(n)

intersection of n lists

I have been trying out to figure out a way to find intersection of N lists in c++.
The method that is clicking me is sort, merge and iterate.
Is there any other way too ?
Please share your suggestions.
Sort each list using std::sort (or if it's an std::list, use std::list::sort) then compute the intersections using std::set_intersection iteratively (apply it to the first two lists, then to the result and the third list, then to the result and the fourth list, and so on).
A solution using unsorted lists would be messier. Presumably, you'd have an 'answer' list, initially empty. Then you'd identify two lists, and step through one; for each element, you'd scan the other list to see if it is present in that list - storing the element in the answer if there's a match. Then you'd create a new empty answer list, and step through another of the original lists, searching through the previous answer list for a matching element, and adding to the new answer list. Repeat ad nauseam.
This is not particularly efficient.
For the 'sort, merge, and iterate' solution, working with pairs of lists, is also not as effective as simultaneously iterating through the N sorted lists in parallel, only selecting elements that appear in all lists as part of the answer.