Find the minimum time - c++

The problem statement is ->
We want to scan N documents using two scanners. If S1 and S2 are the time taken by the scanner 1 and scanner 2 to scan a single document, find the minimum time required to scan all the N documents.
Example:-
S1 = 2, S2 = 4, N = 2
Output: 4
Explanation: Here we have two possibilities.
Either scan both documents in scanner 1 or
scan one document in each scanner.
In both the cases time required is 4.
I came up with a solution where we find all the combinations and insert them into the set. The minimum value will be the first element in the set.
The problem is the solution will have time complexity of O(n), but the accepted time complexity is O(logn).
I am thinking on the lines of binary search but can't come up with a solution.
What should be the approach?

If you could scan a fraction of a document, the fastest way would be to scan N*S2/(S1+S2) documents in scanner 1, and N*S1/(S1+S2) documents in scanner 2. And if these are not integers, you must round one of them up and one of them down, which gives you just two possibilities to check. This is O(1), not O(log n).

Well, I'm sharing the O(log n) approach. With binary search on ans / time, we could find the optimal time.
For binary search, we need upper bound & lower bound. Let's assume lower bound as 0. Now we need to find out the upper bound.
What will be the minimum time required if we scan all the documents in one scanner. It will be min (S1,S2) * N, right? Note: here we are not using other scanner which could scan documents while another one is busy. So min(S1,S2) * N will be our upper bound.
We've got our bounds,
Lower bound = 0
Upper bound = min(S1,S2) * N
Now do BS on time, take a mid & check how many documents can be scanned with scanner 1 scanner 2 within mid time. Whenever total scanned documents get >= N then mid could be ans
.
You can check BS from here - https://www.hackerearth.com/practice/algorithms/searching/binary-search/tutorial/

Related

find number of substrings that become palindromes after you reduce the substring

This is an interview problem.
You are given a string. Example S = "abbba".
If I choose a substring say "bbba" then this selection gets reduced to "ba" (All continously repeating characters are dropped).
You need to find number of odd/even length substring selection that would result to a palindrome after reduction.
I can solve it using 2D dp if it was not the condition of selection reduction which makes problem complicated.
First, reduce your entire string and save the quantity that each character present in the reduced string appears in the original string (can be done in O(n)). Let the reduced string be x1x2...xk and the respective quantitites be q1, q2, ..., qk.
Calculate the 2D dp you mentioned but for the reduced string (takes O(k^2)).
Now, it becomes a combinatorics problem. It can be solved using simple combinatorics principles, like the additive principle and the multiplicative principle. The total of substrings that become palindromes after you reduce it is:
q1 * dp[1][1] + q2 * (dp[1][2] + dp[2][2]) + ... + qk * (dp[1][k] + dp[2][k] + ... + dp[k][k])
It takes O(k^2) to compute this sum.
Now, how many of those have odd length ? How many have even length ?
To find it, you need some more simple observations about odd and even numbers and some careful case by case analysis. I will let you try it. Let me know if you have some problem.

Two disjoint intervals minimum sum

Given a list of N intervals [a,b] cost c, find the minimum sum of 2 non-overlapping intervals. I have an algorithm in O(n^2) ( pastebin.com/kveAZTwv ) but i don t find the O(N log N).
The first value is the number of intervals. the other lines are : a,b,c where a is the beginning of the interval, b the end and c the cost.
example :
input :
3
0 10 1
1 2 2
9 12 2
output :
4
Here is the basic idea for an algorithm in O(n log n), I am sure it can be done more efficiently:
1) sort all intervals accoring to their endpoint, each endpoint is a potential splitpoint where intervals to the right and intervals to the left do not overlap
2) now scan through the sorted intervals and for each spiltpoint remember the minimum cost interval to the left.
3) sort all elements according to their startpoint
4) for each splitpoint memorize additionally the minimum cost interval to the right(having its startpoint after the splitpoint). This is also possible with a single scan from back to front of the sorted elements
5) for each splitpoint add the two corresponding cost and look for the minimum.
Sorry this is a bit informal since i am on mobile.

get the number of overlapping intervals, given two lists of intervals

I recently came across an interesting problem:
Given two lists of intervals, find the total number of overlapping intervals from the two lists.
Example
L1: ([1,2][2,3][4,5][6,7])
L2: ([1,5][2,3][4,7][5,7])
[1,5] overlaps [1,2] [2,3] [4,5]
[2,3] overlaps [1,2] [2,3]
[4,7] overlaps [4,5] [6,7]
[5,7] overlaps [4,5] [6,7]
total = 3+2+2+2 = 9
Obviously the brute force approach works, but it's too slow (I need something better than O(n^2)).
I also fond a similar problem here. But it's not exactly the same...
Any help is appreciated
Make two sorted lists with pairs (value; +1 or -1 for start and end of interval).
Two counters - Count1 and Count2 which show number of active intervals in the first and the second lists.
Walk through both lists in merge manner.
When you get pair from the first list with +1 - increment Count1
When you get pair from the first list with -1 - decrement Count1 and add Count2 to the result
The same for pairs from the second list
Pseudocode for the last stage
CntA = 0
CntB = 0
Res = 0
ia = 0
ib = 0
while (ia < A.Length) and (ib < B.Length)
if Compare(A[ia], B[ib]) <= 0
CntA = CntA + A[ia].Flag
if (A[ia].Flag < 0)
Res = Res + CntB
ia++
else
CntB = CntB + B[ib].Flag
if B[ib].Flag < 0
Res = Res + CntA
ib++
Subtle moment - comparison if Compare(A[ia], B[ib]) <= 0
We should here take into account also flags - to correctly treat situations when endpoints only touch like [1..2][2..3] (you consider this situation as intersection). So both sorting and merge comparator should take synthetic value like this: 3 * A[ia].Value - A[ia].Flag. With such comparing start of interval is treated before end of interval with the same coordinate.
P.S. Made quick test in Delphi. Works for given data set and pair of others.
Delphi code (ideone FPC doesn't compile it due to generics)
Try to look for sweep line algorithms, it will give you the fastest solution.
You can check short description at TopCoder site or watch video from Robert Sedgwick. These describe a bit more hard problem but should give you an approach how to solve yours.
Actually the main idea is to walk over sorted list of begins and ends of your segments each time updating the lists of segments in the special intersecting list.
For this task you will have two intersections lists for each original list respectively. At the start both intersection lists are empty. On coming over begin of the segment you add it to the appropriate intersection list and it obviously intersects all the segments in the other intersection list. When coming to an end of the segment just remove it from the intersection list.
This algorithm will give you O(n log(n)) speed and in worst case O(n) memory.
You may be able to use std::set_intersection in a loop over the second array to match it with each item in the first array. But I am not sure if the performance will match your requirements.
I recently stumbled upon the Interval Tree ADT when tackling a similar question - I suspect it'll be useful for you, whether you implement it or not.
It is basically a ternary tree, and I built it with nodes containing:
Left sub-tree containing intervals less than current node
Right sub-tree containing intervals more than current node
List of overlapping intervals
Interval value encompassing all overlapping intervals
After building the tree in O(n*log(n)), a query function - to check overlapping intervals - should be O(log(n) + m) where m is the number of overlapping intervals reported.
Note that on creation, sorting by end value in the interval and splitting the list should help keep things balanced.

Similar elements from lists

With given N lists of M numbers in each list we have to find ONE element from each group such
every pair ai aj gives |ai-aj| as small as possible.
For example
we have 3 lists
{12,16,67,43}
{7,17,68,48}
{14,15,77,54}
And to minimize result we have to pick
number 16 from list 1
number 17 from list 2
number 15 from list 3
so
|16-17|=1
|16-15|=1
|17-15|=2
so our result is :2
How to solve it fastly? in N*M time ? or log something time
Chris
If you use a linear search, the complexity is O(N*M) for one match (i.e., for each element in set j, do a linear search for the most similar item from set i, and pick the smallest of those results.
If you sort each set first, you get to (at least) O(N log N)+O(M log M) for the sort, and O(M log N) for the searches (where N is the number of elements in set i, and M the number of elements in set j). If you walk through the two sets together you can probably reduce that to O(N + M) for the combined search.

Calculating a relative Levenshtein distance - make sense?

I am using both Daitch-Mokotoff soundexing and Damerau-Levenshtein to find out if a user entry and a value in the application are "the same".
Is Levenshtein distance supposed to be used as an absolute value? If I have a 20 letter word, a distance of 4 is not so bad. If the word has 4 letters...
What I am now doing is taking the distance / length to get a distance that better reflects what percentage of the word has been changed.
Is that a valid/proven approach? Or is it plain stupid?
Is Levenshtein distance supposed to be
used as an absolute value?
It seems like it would depend on your requirements. (To clarify: Levenshtein distance is an absolute value, but as the OP pointed out, the raw value may not be as useful as for a given application as a measure that takes the length of the word into account. This is because we are really more interested in similarity than distance per se.)
I am using both Daitch-Mokotoff
soundexing and Damerau-Levenshtein to
find out if a user entry and a value
in the application are "the same".
Sounds like you're trying to determine whether the user intended their entry to be the same as a given data value?
Are you doing spell-checking? or conforming invalid input to a known set of values?
What are your priorities?
Minimize false positives (try to make sure all suggested words are very "similar", and list of suggestions is short)
Minimize false negatives (try to make sure that the string the user intended is in the list of suggestions, even if it makes the list long)
Maximize average matching accuracy
You might end up using the Levenshtein distance in one way to determine whether a word should be offered in a suggestion list; and another way to determine how to order the suggestion list.
It seems to me, if I've inferred your purpose correctly, that the core thing you want to measure is similarity rather than difference between two strings. As such, you could use Jaro or Jaro-Winkler distance, which takes into account the length of the strings and the number of characters in common:
The Jaro distance dj of two given
strings s1 and s2 is
(m / |s1| + m / |s2| + (m - t) / m) / 3
where:
m is the number of matching characters
t is the number of transpositions
Jaro–Winkler distance uses a prefix
scale p which gives more favourable
ratings to strings that match from the
beginning for a set prefix length l.
The levenshtein distance is a relative value between two words. Comparing the LD to the length is not relevant eg
cat -> scat = 1 (75% similar??)
difference -> differences = 1 (90% similar??)
Both these words have lev distances of 1 ie they differ by one character, but when compared to their lengths the second set would appear to be 'more' similar.
I use soundexing to rank words that have the same lev distance eg
cat and fat both have a LD of 1 relative to kat, but the word is more likely to be kat than fat when using soundex (assuming the word is incrrectly spelt, not incorrectly typed!)
So the short answer is just use the lev distance to determine the similarity.