This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Given two arrays a and b .Find all pairs of elements (a1,b1) such that a1 belongs to Array A and b1 belongs to Array B whose sum a1+b1 = k .
Given : An unsorted array A of integers
Input : An integer k
Output : All the two element set with sum of elements in each set equal to k in O(n).
Example:
A = {3,4,5,1,4,2}
Input : 6
Output : {3,3}, {5,1}, {4,2}
Note : I know an O(n logn) solution but that would require to have the array sorted. Is there any way by which this problem can be solved in O(n). An non-trivial C++ data structure can be used i.e there's no bound on space
Make a constant-time lookup table (hash) so you can see if a particular integer is included in your array (O(n)). Then, for each element in the array, see if k-A[i] is included. This takes constant time for each element, so a total of O(n) time. This assumes the elements are distinct; it is not difficult to make it work with repeating elements.
Just a simple algorithm off the top of my head:
Create a bitfield that represents the numbers from 0 to k, labeled B
For each number i in A
Set B[i]
If B[k-i] is set, add (i, k-i) to the output
Now as people have raised, if you need to have two instances of the number 3 to output (3, 3) then you just switch the order of the last two statements in the above algorithm.
Also I'm sure that there's a name for this algorithm, or at least some better one, so if anyone knows I'd be appreciative of a comment.
http://codepad.org/QR9ptUwR
This will print all pairs. The algorithm is same as told by #bdares above.
I have used stl maps as we dont have hash tables in STL.
One can reduce the,
Element Uniqueness bit,
to this. No O(n).
There are k pairs of integers that sum to k: {0,k}, {1,k-1}, ... etc. Create an array B of size k+1 where elements are boolean. For each element e of the array A, if e <= k && B[e] == false, set B[e] = true and if B[k-e] == true, emit the pair {e,k-e}. Needs to be extended slightly for negative integers.
Related
I'm practicing lambdas:
int main()
{
std::vector<int> v {1,2,3,4};
int count = 0;
sort(v.begin(), v.end(), [](const int& a, const int& b) -> bool
{
return a > b;
});
}
This is just code from GeeksForGeeks to sort in descending order, nothing special. I added some print statements (but took them out for this post) to see what was going on inside the lambda. They print the entire vector, and the a and b values:
1 2 3 4
a=2 b=1
2 1 3 4
a=3 b=2
3 2 1 4
a=4 b=3
4 3 2 1 <- final
So my more detailed question is:
What's the logic behind the order the vector elements are being passed into the a and b parameters?
Is b permanently at index 0 while a is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element? Is it compiler-specific? Thanks!
By passing a predicate to std::sort(), you are specifying your sorting criterion. The predicate must return true if the first parameter (i.e., a) precedes the second one (i.e., b), for the sorting criterion you are specifying.
Therefore, for your predicate:
return a > b;
If a is greater than b, then a will precede b.
So my more detailed question is: What's the logic behind the order the vector elements are being passed into the a and b parameters?
a and b are just pairs of elements of the elements you are passing to std::sort(). The "logic" will depend on the underlying algorithm that std::sort() implements. The pairs may also differ for calls with identical input due to randomization.
Is 'b' permanently at index 0 while 'a' is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element?
No, because the first element is the higher.
Seems that, with this algorithm, all elements are checked (and maybe switched) with the higher one (at first round) and the higher one is placed in first position; so b ever points to the higher one.
For Visual Studio, std::sort uses insertion sort if the sub-array size is <= 32 elements. For a larger sub-array, it uses intro sort, which is quick sort unless the "recursion" depth gets too deep, in which case it switches to heap sort. The output you program produces appears to correspond to some variation of insertion sort. Since the compare function is "less than", and since insertion sort is looking for out of order due to left values "greater than" right values, the input parameters are swapped.
You just compare two elements, with a given ordering. This means that if the order is a and then b, then the lambda must return true.
The fact that a or b are the first or the last element of the array, or fixed, depends on the sorting algorithm and of course of your data!
let's say we have the following 2d array of integers:
1 3 3 1
1 0 2 2
2 0 3 1
1 1 1 0
2 1 1 3
I was trying to create an implementation where the user could give as input the array itself and a string. An example of a string in the above example would be 03 which would mean that the user wants to sort the array based on the first and the fourth column.
So in this case the result of the sorting would be the following:
1 1 1 0
1 3 3 1
1 0 2 2
2 0 3 1
2 1 1 3
I didn't know a lot about the compare functions that are being used inside the STL's sort function, however after searching I created the following simple implementation:
I created a class called Comparator.h
class Comparator{
private:
std::string attr;
public:
Comparator(std::string attr) { this->attr = attr; }
bool operator()(const int* first, const int* second){
std::vector<int> left;
std::vector<int> right;
size_t i;
for(i=0;i<attr.size();i++){
left.push_back(first[attr.at(i) - '0']);
right.push_back(second[attr.at(i) - '0']);
}
for(i=0;i<left.size();i++){
if(left[i] < right[i]) return true;
else if(left[i] > right[i]) return false;
}
return false;
}
};
I need to know the information inside the string so I need to have a class where this string is a private variable. Inside the operator I would have two parameters first and second, each of which will refer to a row. Now having this information I create a left and a right vector where in the left vector I have only the numbers of the first row that are important to the sorting and are specified by the string variable and in the right vector I have only the numbers of the second row that are important to the sorting and are specified by the string variable.
Then I do the needed comparisons and return true or false. The user can use this class by calling this function inside the Sorting.cpp class:
void Sorting::applySort(int **data, std::string attr, int amountOfRows){
std::sort(data, data+amountOfRows, Comparator(attr));
}
Here is an example use:
int main(void){
//create a data[][] variable and fill it with integers
Sorting sort;
sort.applySort(data, "03", number_of_rows);
}
I have two questions:
First question
Can my implementation get better? I use extra variables like the left and right vectors, and then I have some for loops which brings some extra costing to the sorting operation.
Second question
Due to the extra cost, how much worse does the time complexity of the sorting become? I know that STL's sort is O(n*logn) where n is the number of integers that you want to sort. Here n has a different meaning, n is the number of rows and each row can have up to m integers which in turn can be found inside the Comparator class by overriding the operator function and using extra variables(the vectors) and for loops.
Because I'm not sure how exactly is STL's sort implemented I can only make some estimates.
My initial estimate would be O(n*m*log(n)) where m is the number of columns that are important to the sorting however I'm not 100% certain about it.
Thank you in advance
You can certainly improve your comparator. There's no need to copy the columns and then compare them. Instead of the two push_back calls, just compare the values and either return true, return false, or continue the loop according to whether they're less, greater, or equal.
The relevant part of the complexity of sort is O(n * log n) comparisons (in C++11. C++03 doesn't give quite such a good guarantee), where n is the number of elements being sorted. So provided your comparator is O(m), your estimate is OK to sort the n rows. Since attr.size() <= m, you're right.
First question: you don't need left and rigth - you add elements one by one and then iterate over the vectors in the same order. So instead of pushing values to vectors and then iterating over them, simply use the values as you generate them in the first cycle like so:
for(i=0;i<attr.size();i++){
int left = first[attr.at(i) - '0'];
int right = second[attr.at(i) - '0'];
if(left < right) return true;
else if(left > right) return false;
}
Second question: can the time complexity be improved? Not with sorting algorithm that uses direct comparison. On the other had the problem you solve here is somewhat similar to radix sort. And so I believe you should be able to do the sorting in O(n*m) where m is the number of sorting criteria.
1) Firstly to start off you should convert the string into an integer array in the constructor. With validation of values being less than the number of columns.
(You could also have another constructor that takes an integer array as a parameter.
A slight enhancement is to allow negative values to indicate that the order of the sort is reversed for that column. In this case the values would be -N..-1 , 1..N)
2) There is no need for the intermediate left, right arrays.
I need to shuffle an array so that all array elements should change their location.
Given an array [0,1,2,3] it would be ok to get [1,0,3,2] or [3,2,0,1] but not [3,1,2,0] (because 2 left unchanged).
I suppose algorithm would not be language-specific, but just in case, I need it in C++ program (and I cannot use std::random_shuffle due to the additional requirement).
What about this?
Allocate an array which contains numbers from 0 to arrayLength-1
Shuffle the array
If there is no element in array whose index equals its value, continue to step 4; otherwise repeat from step 2.
Use shuffled array values as indexes for your array.
For each element e
If there is an element to the left of e
Select a random element r to the left of e
swap r and e
This guarantees that each value isn't in the position that it started, but doesn't guarantee that each value changes if there's duplicates.
BeeOnRope notes that though simple, this is flawed. Given the list [0,1,2,3], this algorithm cannot produce the output [1,0,3,2].
It's not going to be very random, but you can rotate all the elements at least one position:
std::rotate(v.begin(), v.begin() + (rand() % v.size() - 1) + 1, v.end());
If v was {1,2,3,4,5,6,7,8,9} at the beginning, then after rotation it will be, for example: {2,3,4,5,6,7,8,9,1}, or {3,4,5,6,7,8,9,1,2}, etc.
All elements of the array will change position.
I kind of have a idea in my mind hope it fits your application. Have one more container and this container will be
a "map(int,vector(int))" . The key element will show index and the second element the vector will hold the already used values.
For example for the first element you will use rand function to find which element of the array you should use.Than you will check the map structure if this element of the array has been used for this index.
I have two sets A and B. Set A contains unique elements. Set B contains all elements. Each element in the B is a 10 by 10 matrix where all entries are either 1 or 0. I need to scan through set B and everytime i encounter a new matrix i will add it to set A. Therefore set A is a subset of B containing only unique matrices.
It seems like you might really be looking for a way to manage a large, sparse array. Trivially, you could use a hash map with your giant index as your key, and your data as the value. If you talk more about your problem, we might be able to find a more appropriate data structure for your problem.
Update:
If set B is just some set of matrices and not the set of all possible 10x10 binary matrices, then you just want a sparse array. Every time you find a new matrix, you compute its key (which could simply be the matrix converted into a 100 digit binary value, or even a 100 character string!), look up that index. If no such key exists, insert the value 1 for that key. If the key does exist, increment and re-store the new value for that key.
Here is some code, maybe not very efficient :
# include <vector>
# include <bitset>
# include <algorithm>
// I assume your 10x10 boolean matrix is implemented as a bitset of 100 bits.
// Comparison of bitsets
template<size_t N>
class bitset_comparator
{
public :
bool operator () (const std::bitset<N> & a, const std::bitset<N> & b) const
{
for(size_t i = 0 ; i < N ; ++i)
{
if( !a[i] && b[i] ) return true ;
else if( !b[i] && a[i] ) return false ;
}
return false ;
}
} ;
int main(int, char * [])
{
std::set< std::bitset<100>, bitset_comparator<100> > A ;
std::vector< std::bitset<100> > B ;
// Fill B in some manner ...
// Keeping unique elements in A
std::copy(B.begin(), B.end(), std::inserter(A, A.begin())) ;
}
You can use std::listinstead of std::vector. The relative order of elements in B is not preserved in A (elements in A are sorted).
EDIT : I inverted A and B in my first post. It's correct now. Sorry for the inconvenience. I also corrected the comparison functor.
Each element in the B is a 10 by 10 matrix where all entries are either 1 or 0.
Good, that means it can be represented by a 100-bit number. Let's round that up to 128 bits (sixteen bytes).
One approach is to use linked lists - create a structure like (in C):
typedef struct sNode {
unsigned char bits[16];
struct sNode *next;
};
and maintain the entire list B as a sorted linked list.
The performance will be somewhat less (a) than using the 100-bit number as an array index into a truly immense (to the point of impossible given the size of the known universe) array.
When it comes time to insert a new item into B, insert it at its desired position (before one that's equal or greater). If it was a brand new one (you'll know this if the one you're inserting before is different), also add it to A.
(a) Though probably not unmanageably so - there are options you can take to improve the speed.
One possibility is to use skip lists, for faster traversal during searches. These are another pointer that references not the next element but one 10 (or 100 or 1000) elements along. That way you can get close to the desired element reasonably quickly and just do the one-step search after that point.
Alternatively, since you're talking about bits, you can divide B into (for example) 1024 sub-B lists. Use the first 10 bits of the 100-bit value to figure out which sub-B you need to use and only store the next 90 bits. That alone would increase search speed by an average of 1000 (use more leading bits and more sub-Bs if you need improvement on that).
You could also use a hash on the 100-bit value to generate a smaller key which you can use as an index into an array/list, but I don't think that will give you any real advantage over the method in the previous paragraph.
Convert each matrix into a string of 100 binary digits. Now run it through the Linux utilities:
sort | uniq
If you really need to do this in C++, it is possible to implement your own merge sort, then the uniq part becomes trivial.
You don't need N buckets where N is the number of all possible inputs. A binary tree will just do fine. This is implemented with set class in C++.
vector<vector<vector<int> > > A; // vector of 10x10 matrices
// fill the matrices in A here
set<vector<vector<int> > > B(A.begin(), A.end()); // voila!
// now B contains all elements in A, but only once for duplicates
Does anyone know if it's possible to turn this from O(m * n) to O(m + n)?
vector<int> theFirst;
vector<int> theSecond;
vector<int> theMatch;
theFirst.push_back( -2147483648 );
theFirst.push_back(2);
theFirst.push_back(44);
theFirst.push_back(1);
theFirst.push_back(22);
theFirst.push_back(1);
theSecond.push_back(1);
theSecond.push_back( -2147483648 );
theSecond.push_back(3);
theSecond.push_back(44);
theSecond.push_back(32);
theSecond.push_back(1);
for( int i = 0; i < theFirst.size(); i++ )
{
for( int x = 0; x < theSecond.size(); x++ )
{
if( theFirst[i] == theSecond[x] )
{
theMatch.push_back( theFirst[i] );
}
}
}
Put the contents of the first vector into a hash set, such as std::unordered_set. That is O(m). Scan the second vector, checking if the values are in the unordered_set and keeping a tally of those that are. That is n lookups of a hash structure, so O(n). So, O(m+n). If you have l elements in the overlap, you may count O(l) for adding them to the third vector. std::unordered_set is in the C++0x draft and available in the latest gcc versions, and there is also an implementation in boost.
Edited to use unordered_set
Using C++2011 syntax:
unordered_set<int> firstMap(theFirst.begin(), theFirst.end());
for (const int& i : theSecond) {
if (firstMap.find(i)!=firstMap.end()) {
cout << "Duplicate: " << i << endl;
theMatch.push_back(i);
}
}
Now, the question still remains, what do you want to do with duplicates in the originals? Explicitly, how many times should 1 be in theMatch, 1, 2 or 4 times?
This outputs:
Duplicate: 1
Duplicate: -2147483648
Duplicate: 44
Duplicate: 1
Using this: http://www.cplusplus.com/reference/algorithm/set_intersection/
You should be able to achieve O(mlogm + nlogn) I believe. (set_intersection requires that the input ranges be already sorted).
This might perform a bit differently than your solution for duplicate elements, however.
Please correct me if I am wrong,
you are suggesting following solution for the intersection problem:
sort two vectors, and keep iteration in both sorted vector in such a way that we reach to a common element,
so overall complexity will be
(n*log(n) + m*log(m)) + (n + m)
Assuming k*log(k) as complexity of sorting
Am I right?
Ofcourse the complexity will depend on the complexity of sorting.
I would sort the longer array O(n*log (n)), search for elements from the shorter array O(m*log (n)). Total is then O(n*log(n) + m*log (n) )
Assuming you want to produce theMatch from two data sets, and you don't care about the data sets themselves, put one in an unordered_map (available currently from Boost and listed in the final committee draft for C++11), mapping the key to an integer that increases whenever added to, and therefore keeps track of the number of times the key occurs. Then, when you get a hit on the other data set, you push_back the hit the number of times it occurred in the first time.
You can get to O(n log n + m log m) by sorting the vectors first, or O(n log n + m) by creating a std::map of one of them.
Caveat: these are not order-preserving operations, and theMatch will come out in different orders with different techniques. It looks to me like the order is likely considered arbitrary. If the order given in the code above is necessary, I don't think there's a better algorithm.
Edit:
Take data set A and data set B, of type Type. Create an unordered_map<Type, int>.
Go through data set A, and check each member to see if it's in the map. If not, add the element with the int 1 to the map. If it is, increment the int. Each of these operations is O(1) on the average, so this step is O(len A).
Go through data set B, and check each member to see if it's in the map. If not, go on to the next. If so, push_back the member onto the destination queue. The int is the number of times that value is in data set A, so do the push_back the number of times the member's in A to duplicate the behavior given. Each of these operations is on the average O(1), so this step is O(len B).
This is average behavior. If you always hit the worst case, you're back with O(m*n). I don't think there's a way to guarantee O(m + n).
If the order of the elements in the resulting array/set doesn't matter then the answer is yes.
For the arbitrary types of elements with some order defined the best algorithm is O( max(m,n)*log(min(m,n)) ). For the numbers of limited size the best algorithm is O(m+n).
Construct the set of elements of smaller array - for arbitrary elements just sorting is OK and for the numbers of limited size it must be something similar to intermediate table in numeric sort.
Iterate through larger array and check if the element is within a set constructed earlier - for the arbitrary element binary search is OK (which is O(log(min(n,m))) and for numbers the single check is O(1).