Most efficient algorithm for Two-sum problem (involving indices) - c++

The problem statement is given an array and a given sum "T", find all the pairs of indices of the elements in the array which add up to T. Additional requirements/constraints:
Indexing starts from 0
The indices must be displayed with lower index first (Ex: 24, 30 instead of 30, 24)
The indices must be displayed in ascending order (Ex: if we find (1,3), (0,2) and (5,8) the output must be (0,2) (1,3) (5,8)
There can be duplicate elements in the array, which also have to be considered
Here's my code in C++, I used the hash-table approach using unordered_set:
void Twosum(vector <int> res, int T){
int temp; int ti = -1;
unordered_set<int> s;
vector <int> res2 = res; //Just a copy of the input vector
vector <tuple<int, int>> indices; //Result to be output
for (int i = 0; i < (int)res.size(); i++){
temp = T - res[i];
if (s.find(temp) != s.end()){
while(ti < (int)res.size()){ //While loop for finding all the instances of temp in the array,
//not part of the original hash-table algorithm, something I added
ti = find(res2.begin(), res2.end(), temp) - res2.begin();
//Here find() takes O(n) time which is an issue
res2[ti] = lim; //To remove that instance of temp so that new instances
//can be found in the while loop, here lim = 10^9
if(i <= ti) indices.push_back(make_tuple(i, ti));
else indices.push_back(make_tuple(ti, i));
}
}
s.insert(res[i]);
}
if(ti == -1)
{cout<<"-1 -1"; //if no indices were found
return;}
sort(indices.begin(), indices.end()); //sorting since unordered_set stores elements randomly
for(int i=0; i<(int)indices.size(); i++)
cout<<get<0>(indices[i])<<" "<<get<1>(indices[i])<<endl;
}
This has multiple issues:
firstly that while loop doesn't work as intended, instead it shows SIGABRT error (free(): invalid pointer). The ti index is also somehow going beyond the vector bounds, even though I have that check in the while loop.
Secondly the find() function works in O(n) time, which increases the overall complexity to O(n^2), which is causing my program to timeout during execution. However that function is required since we have to output indices.
Lastly this unordered-set implementation doesn't seem to work when there are many duplicate elements in the array (since sets only take unique elements), which is one of the main constraints of the problem. This makes me think we need some sort of hash function or hashmap to deal with the duplicates? I'm not sure...
All the different algorithms I've found for this on the internet have dealt with just printing the elements and not the indices, hence I've had no luck with this problem.
If any of you know an optimal algorithm for this while also satisfying the constraints and running under O(n) time, your help would be highly appreciated. Thank you in advance.

Here is a pseudo-code answering your question, using hash tables (or maps) and set. I let you translate this to cpp using adapted data structures (in this case, classic hashmaps and sets will do the job well).
Notations: we will denote A the array, n its length, and T the "sum".
// first we build a map element -> {set of indices corresponding to this element}
Let M be an empty map; // or hash map, or hash table, or dictionary
for i from 0 to n-1 do {
Let e = A[i];
if e is not a key of M then {
M[e] = new_set()
}
M[e].add(i)
}
// Now we iterate over the elements
for each key e of M do {
if T-e is a key of M then {
display_combinations(M[e], M[T-e]);
}
}
// The helper function display_combinations
function display_combinations(set1, set2) {
for each element e1 of set1 do {
for element e2 of set2 do {
if e1 < e2 then {
display "(e1, e2)";
} else if e1 > e2 then {
display "(e2, e1)";
}
}
}
}
As said in the comments, the complexity in the worst case of this algorithm is in O(n²). A way to see that we cannot go below this complexity is that the size of the output may be in O(n²), in the case where all elements of the array have the value T/2.
Edit: this pseudo code does not output the pairs in the order. Just store them in an array of pairs, and sort this array before displaying it. Same, I did not treat the case where a pair (i, i) may satisfy the requirement. You may have to consider it (just change e1 > e2 by e1 >= e2 in the last loop)

Related

Find uncommon elements using hashing

I think this is a fairly common question but I didn't find any answer for this using hashing in C++.
I have two arrays, both of the same lengths, which contain some elements, for example:
A={5,3,5,4,2}
B={3,4,1,2,1}
Here, the uncommon elements are: {5,5,1,1}
I have tried this approach- iterating a while loop on both the arrays after sorting:
while(i<n && j<n) {
if(a[i]<b[j])
uncommon[k++]=a[i++];
else if (a[i] > b[j])
uncommon[k++]=b[j++];
else {
i++;
j++;
}
}
while(i<n && a[i]!=b[j-1])
uncommon[k++]=a[i++];
while(j < n && b[j]!=a[i-1])
uncommon[k++]=b[j++];
and I am getting the correct answer with this. However, I want a better approach in terms of time complexity since sorting both arrays every time might be computationally expensive.
I tried to do hashing but couldn't figure it out entirely.
To insert elements from arr1[]:
set<int> uncommon;
for (int i=0;i<n1;i++)
uncommon.insert(arr1[i]);
To compare arr2[] elements:
for (int i = 0; i < n2; i++)
if (uncommon.find(arr2[i]) != uncommon.end())
Now, what I am unable to do is to send only those elements to the uncommon array[] which are uncommon to both of them.
Thank you!
First of all, std::set does not have anything to do with hashing. Sets and maps are ordered containers. Implementations may differ, but most likely it is a binary search tree. Whatever you do, you wont get faster that nlogn with them - the same complexity as sorting.
If you're fine with nlogn and sorting, I'd strongly advice just using set_symmetric_difference algorithm https://en.cppreference.com/w/cpp/algorithm/set_symmetric_difference , it requires two sorted containers.
But if you insist on an implementation relying on hashing, you should use std::unordered_set or std::unordered_map. This way you can be faster than nlogn. You can get your answer in nm time, where n = a.size() and m = b.size(). You should create two unordered_set`s: hashed_a, hashed_b and in two loops check what elements from hashed_a are not in hashed_b, and what elements in hashed_b are not in hashed_a. Here a pseudocode:
create hashed_a and hashed_b
create set_result // for the result
for (a_v : hashed_a)
if (a_v not in hashed_b)
set_result.insert(a_v)
for (b_v : hashed_b)
if (b_v not in hashed_a)
set_result.insert(b_v)
return set_result // it holds the symmetric diference, which you need
UPDATE: as noted in the comments, my answer doesn't count for duplicates. The easiest way to modify it for duplicates would be to use unordered_map<int, int> with the keys for elements in the set and values for number of encounters.
First, you need to find a way to distinguish between the same values contained in the same array (for ex. 5 and 5 in the first array, and 1 and 1 in the second array). This is the key to reducing the overall complexity, otherwise you can't do better than O(nlogn). A good possible algorithm for this task is to create a wrapper object to hold your actual values, and put in your arrays pointers to those wrapper objects with actual data, so your pointer addresses will serve as a unique identifier for objects. This wrapping will cost you just O(n1+n2) operations, but also an additional O(n1+n2) space.
Now your problem is that you have in both arrays only elements unique to each of those arrays, and you want to find the uncommon elements. This means the (Union of both array elements) - (Intersection of both array elements). Therefore, all you need to do is to push all the elements of the first array into a hash-map (complexity O(n1)), and then start pushing all the elements of the second array into the same hash-map (complexity O(n2)), by detecting the collisions (equality of an element from first array with an element from the second array). This comparison step will require O(n2) comparisons in the worst case. So for the maximum performance optimization you could have checked the size of the arrays before starting pushing the elements into the hash-map, and swap the arrays so that the first push will take place with the longest array. Your overall algorithm complexity would be O(n1+n2) pushes (hashings) and O(n2) comparisons.
The implementation is the most boring stuff, so I let it to you ;)
A solution without sorting (and without hashing but you seem to care more about complexity then the hashing itself) is to notice the following : an uncommon element e is an element that is in exactly one multiset.
This means that the multiset of all uncommon elements is the union between 2 multisets:
S1 = The element in A that are not in B
S2 = The element in B that are not in A
Using the std::set_difference, you get:
#include <set>
#include <vector>
#include <iostream>
#include <algorithm>
int main() {
std::multiset<int> ms1{5,3,5,4,2};
std::multiset<int> ms2{3,4,1,2,1};
std::vector<int> v;
std::set_difference( ms1.begin(), ms1.end(), ms2.begin(), ms2.end(), std::back_inserter(v));
std::set_difference( ms2.begin(), ms2.end(), ms1.begin(), ms1.end(), std::back_inserter(v));
for(int e : v)
std::cout << e << ' ';
return 0;
}
Output:
5 5 1 1
The complexity of this code is 4.(N1+N2 -1) where N1 and N2 are the size of the multisets.
Links:
set_difference: https://en.cppreference.com/w/cpp/algorithm/set_difference
compiler explorer: https://godbolt.org/z/o3KGbf
The Question can Be solved in O(nlogn) time-complexity.
ALGORITHM
Sort both array with merge sort in O(nlogn) complexity. You can also use sort-function. For example sort(array1.begin(),array1.end()).
Now use two pointer method to remove all common elements on both arrays.
Program of above Method
int i = 0, j = 0;
while (i < array1.size() && j < array2.size()) {
// If not common, print smaller
if (array1[i] < array2[j]) {
cout << array1[i] << " ";
i++;
}
else if (array2[j] < array1[i]) {
cout << array2[j] << " ";
j++;
}
// Skip common element
else {
i++;
j++;
}
}
Complexity of above program is O(array1.size() + array2.size()). In worst case say O(2n)
The above program gives the uncommon elements as output. If you want to store them , just create a vector and push them into vector.
Original Problem LINK

Find && Delete Duplicate vector elements in a for loop

I have these files in a vector
vector<string> myFileNames ={"TestFile1","TestFile2","copiedFile1","copiedFile2","copiedFile3"};
Should only have unique file left in list i.e. test file 1&2
vector<string> duplicateFilesFound;
To be filled with files that duplicates i.e. copiedFile 1, 2 && 3 are duplicates
vector<string> myMd5Strings;
Filled with md5 hash values
string target = " ";
int count = 0;
for (unsigned int i = 0; i < myFileNames.size(); ++i)
{
target = myMd5Strings[i];
for (unsigned int k = 1; k < myFileNames.size(); ++k)
{
if (target == myMd5Strings[k])
{
myFileNames.erase(myFileNames.begin() + k);
duplicateFilesFound.push_back(myFileNames[k]);
}
}
cout << "Duplicate Count is : " << duplicateFilesFound.size() << endl;
}
One mistake (maybe there are others) is that you are first getting the MD5 to test, but then your inner k loop will erroneously erase the first item matching the hash. For example, if your target (the i value) is at element 2, your loop doesn't check if k == 2 so as to skip over this first occurrence.
The test should be:
if ( i != k && target == myMd5Strings[k])
The second issue is that if you erase an item, you need to keep the k value where it is and not increment. Otherwise you will skip over the next item to check and miss this item if it matches the hash.
But even with pointing these gotchas out, your attempt is an order of n^2 -- you have 100 names, that is 10,000 iterations, 1,000 names, 1,000,000 iterations, etc. At the end of the day, this will perform poorly on large lists.
The approach mentioned below is logarithmic (due to the sort) in time, not quadratic, thus executing much faster for large list sizes.
But a few structural changes could be done. A better approach would be to store the file name, along with MD5 hash in a single struct. Then declare a container of this struct.
struct file_info
{
std::string filename;
std::string md5hash;
};
//...
std::vector<file_info> fInfoV;
Once you have this, then everything becomes very simple using some basic algorithm functions.
To remove duplicates with a certain md5 hash can be done by first sorting the list using the hash value as the sort criteria, and then removing the duplicates using std::unique and then erase.
#include <algorithm>
//...
std::sort(fInfoV.begin(), fInfoV.end(),
[](const file_info&f1, const file_info& f2)
{ f1.md5hash < f2.md5hash; });
// uniqueify the vector
auto iter = std::unique(fInfoV.begin(), fInfoV.end(),
[](const file_info& f1, const file_info& f2)
{ f1.md5hash == f2.md5hash; });
// for kicks, get the number of duplicates found
auto duplicateCount = std::distance(iter, fInfoV.end());
cout << "There are " << duplicateCount << " duplicates found";
// erase these duplicates
fInfoV.erase(iter, fInfoV.end());
The code above, sorts, uniquifies, and erases the duplicates. I threw in std::distance to show you that you can get the number of duplicates without having to create another vector. The code above also overcomes the need to write loops checking for equivalent loop indices, making sure you reseat the inner index, checking for bounds, etc.
As to time complexity:
The std::sort function is guaranteed to have logarithmic complexity.
The std::unique function has linear complexity:
So this will perform much better for large lists.

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(arr[i] == - arr[j])
return true;
}
}
return false;
}
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.
An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.
I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
{
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
else
res.insert(e);
}
Output:
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example
Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.
You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
}
s.insert(e);
}
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).

Is this a shell sort or an insertion sort?

I'm just starting to learn about sorting algorithms and found one online. At first i thought it was a shell sort but it's missing that distinct interval of "k" and the halving of the array so i'm not sure if it is or not. My second guess is an insertion sort but i'm just here to double check:
for(n = 1; n < num; n++)
{
key = A[n];
k = n;
while((k > 0) && (A[k-1] > key))
{
A[k] = A[k-1];
k = k-1;
}
A[k] = key;
}
Also if you can explain why that'd be helpful as well
Shell Sort consists of many insertion sorts that are performed on sub-arrays of the original array.
The code you have provided is insertion sort.
To get shell sort, it would be roughly having other fors around your code changing h (that gap in shell sort) and starting index of the sub-array and inside, instead of moving from k to k-1, you move from k to k+h (or k-h depending on which direction you do the insertion sort)
I think you're right, that does look a lot like an insertion sort.
This fragment assumes A[0] is already inserted. If n == 0, then the k > 0 check will fail and execution will continue at A[k] = key;, properly storing the first element into the array.
This fragment also assumes that A[0:n-1] is already sorted. It inspects A[n] and starts scanning the array backward, moving forward one place every element that is larger than the original A[n] key.
Once the scanning encounters an element less than or equal to the key, it inserts it in that location.
It's called insertion sort because the line A[k] = key inserts the current value in the correct position in the partially sorted array.

Finding smallest value in an array most efficiently

There are N values in the array, and one of them is the smallest value. How can I find the smallest value most efficiently?
If they are unsorted, you can't do much but look at each one, which is O(N), and when you're done you'll know the minimum.
Pseudo-code:
small = <biggest value> // such as std::numerical_limits<int>::max
for each element in array:
if (element < small)
small = element
A better way reminded by Ben to me was to just initialize small with the first element:
small = element[0]
for each element in array, starting from 1 (not 0):
if (element < small)
small = element
The above is wrapped in the algorithm header as std::min_element.
If you can keep your array sorted as items are added, then finding it will be O(1), since you can keep the smallest at front.
That's as good as it gets with arrays.
You need too loop through the array, remembering the smallest value you've seen so far. Like this:
int smallest = INT_MAX;
for (int i = 0; i < array_length; i++) {
if (array[i] < smallest) {
smallest = array[i];
}
}
The stl contains a bunch of methods that should be used dependent to the problem.
std::find
std::find_if
std::count
std::find
std::binary_search
std::equal_range
std::lower_bound
std::upper_bound
Now it contains on your data what algorithm to use.
This Artikel contains a perfect table to help choosing the right algorithm.
In the special case where min max should be determined and you are using std::vector or ???* array
std::min_element
std::max_element
can be used.
If you want to be really efficient and you have enough time to spent, use SIMD instruction.
You can compare several pairs in one instruction:
r0 := min(a0, b0)
r1 := min(a1, b1)
r2 := min(a2, b2)
r3 := min(a3, b3)
__m64 _mm_min_pu8(__m64 a , __m64 b );
Today every computer supports it. Other already have written min function for you:
http://smartdata.usbid.com/datasheets/usbid/2001/2001-q1/i_minmax.pdf
or use already ready library.
If the array is sorted in ascending or descending order then you can find it with complexity O(1).
For an array of ascending order the first element is the smallest element, you can get it by arr[0] (0 based indexing).
If the array is sorted in descending order then the last element is the smallest element,you can get it by arr[sizeOfArray-1].
If the array is not sorted then you have to iterate over the array to get the smallest element.In this case time complexity is O(n), here n is the size of array.
int arr[] = {5,7,9,0,-3,2,3,4,56,-7};
int smallest_element=arr[0] //let, first element is the smallest one
for(int i =1;i<sizeOfArray;i++)
{
if(arr[i]<smallest_element)
{
smallest_element=arr[i];
}
}
You can calculate it in input section (when you have to find smallest element from a given array)
int smallest_element;
int arr[100],n;
cin>>n;
for(int i = 0;i<n;i++)
{
cin>>arr[i];
if(i==0)
{
smallest_element=arr[i]; //smallest_element=arr[0];
}
else if(arr[i]<smallest_element)
{
smallest_element = arr[i];
}
}
Also you can get smallest element by built in function
#inclue<algorithm>
int smallest_element = *min_element(arr,arr+n); //here n is the size of array
You can get smallest element of any range by using this function
such as,
int arr[] = {3,2,1,-1,-2,-3};
cout<<*min_element(arr,arr+3); //this will print 1,smallest element of first three element
cout<<*min_element(arr+2,arr+5); // -2, smallest element between third and fifth element (inclusive)
I have used asterisk (*), before min_element() function. Because it returns pointer of smallest element.
All codes are in c++.
You can find the maximum element in opposite way.
Richie's answer is close. It depends upon the language. Here is a good solution for java:
int smallest = Integer.MAX_VALUE;
int array[]; // Assume it is filled.
int array_length = array.length;
for (int i = array_length - 1; i >= 0; i--) {
if (array[i] < smallest) {
smallest = array[i];
}
}
I go through the array in reverse order, because comparing "i" to "array_length" in the loop comparison requires a fetch and a comparison (two operations), whereas comparing "i" to "0" is a single JVM bytecode operation. If the work being done in the loop is negligible, then the loop comparison consumes a sizable fraction of the time.
Of course, others pointed out that encapsulating the array and controlling inserts will help. If getting the minimum was ALL you needed, keeping the list in sorted order is not necessary. Just keep an instance variable that holds the smallest inserted so far, and compare it to each value as it is added to the array. (Of course, this fails if you remove elements. In that case, if you remove the current lowest value, you need to do a scan of the entire array to find the new lowest value.)
An O(1) sollution might be to just guess: The smallest number in your array will often be 0. 0 crops up everywhere. Given that you are only looking at unsigned numbers. But even then: 0 is good enough. Also, looking through all elements for the smallest number is a real pain. Why not just use 0? It could actually be the correct result!
If the interviewer/your teacher doesn't like that answer, try 1, 2 or 3. They also end up being in most homework/interview-scenario numeric arrays...
On a more serious side: How often will you need to perform this operation on the array? Because the sollutions above are all O(n). If you want to do that m times to a list you will be adding new elements to all the time, why not pay some time up front and create a heap? Then finding the smallest element can really be done in O(1), without resulting to cheating.
If finding the minimum is a one time thing, just iterate through the list and find the minimum.
If finding the minimum is a very common thing and you only need to operate on the minimum, use a Heap data structure.
A heap will be faster than doing a sort on the list but the tradeoff is you can only find the minimum.
If you're developing some kind of your own array abstraction, you can get O(1) if you store smallest added value in additional attribute and compare it every time a new item is put into array.
It should look something like this:
class MyArray
{
public:
MyArray() : m_minValue(INT_MAX) {}
void add(int newValue)
{
if (newValue < m_minValue) m_minValue = newValue;
list.push_back( newValue );
}
int min()
{
return m_minValue;
}
private:
int m_minValue;
std::list m_list;
}
//find the min in an array list of #s
$array = array(45,545,134,6735,545,23,434);
$smallest = $array[0];
for($i=1; $i<count($array); $i++){
if($array[$i] < $smallest){
echo $array[$i];
}
}
//smalest number in the array//
double small = x[0];
for(t=0;t<x[t];t++)
{
if(x[t]<small)
{
small=x[t];
}
}
printf("\nThe smallest number is %0.2lf \n",small);
Procedure:
We can use min_element(array, array+size) function . But it iterator
that return the address of minimum element . If we use *min_element(array, array+size) then it will return the minimum value of array.
C++ implementation
#include<bits/stdc++.h>
using namespace std;
int main()
{
int num;
cin>>num;
int arr[10];
for(int i=0; i<num; i++)
{
cin>>arr[i];
}
cout<<*min_element(arr,arr+num)<<endl;
return 0;
}
int small=a[0];
for (int x: a.length)
{
if(a[x]<small)
small=a[x];
}
C++ code
#include <iostream>
using namespace std;
int main() {
int n = 5;
int arr[n] = {12,4,15,6,2};
int min = arr[0];
for (int i=1;i<n;i++){
if (min>arr[i]){
min = arr[i];
}
}
cout << min;
return 0;
}