Number contained in an odd number of sets - c++

I have a homework problem which i can solve only in O(max(F)*N) ( N is about 10^5 and F is 10^9) complexity, and i hope you could help me. I am given N sets of 4 integer numbers (named S, F, a and b); Each set of 4 numbers describe a set of numbers in this way: The first a successive numbers, starting from S included are in the set. The next b successive numbers are not, and then the next a numbers are, repeating this until you reach the superior limit, F. For example for S=5;F=50;a=1;b=19 the set contains (5,25,45); S=1;F=10;a=2;b=1 the set contains (1,2,4,5,7,8,10);
I need to find the integer which is contained in an odd number of sets. It is guaranteed that for the given test there is ONLY 1 number which respects this condition.
I tried to go trough every number between min(S) and max(F) and check in how many number of sets this number is included, and if it is included in an odd number of sets, then this is the answer. As i said, in this way I get an O (F*N) which is too much, and I have no other idea how could I see if a number is in a odd number of sets.
If you could help me I would be really grateful. Thank you in advance and sorry for my bad English and explanation!

Hint
I would be tempted to use bisection.
Choose a value x, then count how many numbers<=x are present in all the sets.
If this is odd then the answer is <=x, otherwise >x.
This should take time O(Nlog(F))
Alternative explanation
Suppose we have sets
[S=1,F=8,a=2,b=1]->(1,2,4,5,7,8)
[S=1,F=7,a=1,b=0]->(1,2,3,4,5,6,7)
[S=6,F=8,a=1,b=1]->(6,8)
Then we can table:
N(y) = number of times y is included in a set,
C(z) = sum(N(y) for y in range(1,z)) % 2
y N(y) C(z)
1 2 0
2 2 0
3 1 1
4 2 1
5 2 1
6 2 1
7 2 1
8 2 1
And then we use bisection to find the first place where C(z) becomes 1.

Seems like it'd be useful to find a way to perform set operations, particularly intersection, on these sets without having to generate the actual sets. If you could do that, the intersection of all these sets in the test should leave you with just one number. Leaving the a and b part aside, it's easy to see how you'd take the intersection of two sets that include all integers between S and F: the intersection is just the set with S=max(S1, S2) and F=min(F1, F2).
That gives you a starting point; now you have to figure out how to create the intersection of two sets consider a and b.

XOR to the rescue.
Take the numbers from each successive set and XOR them with the contents of the result set. I.e., if the number is currently marked as "present", change that to "not present", and vice versa.
At the end, you'll have one number marked as present in the result set, which will be the one that occurred an odd number of times. All of the others will have been XORed an even number of times, so they'll be back to the original state.
As for complexity, you're dealing with each input item exactly once, so it's basically linear on the total number of input items -- at least assuming your operations on the result set are constant complexity. At least if I understand how they're phrasing things, that seems to meet the requirement.

It sounds like S is assumed to be non-negative. Given your desire for an O(max(F)*N) time boundary you can use a sieving-like approach.
Have an array of integers with an entry for each candidate number (that is, every number between min(S) and max(F)). Go through all the quadruples and add 1 to all array locations associated with included numbers represented by each quadruple. At the end, look through the array to see which count is odd. The number it represents is the number that satisfies your conditions.
This works because you're going under N quadruples, and each one takes O(max(F)) or less time (assuming S is always non-negative) to count the included numbers. That gives you O(max(F)*N).

Related

Bit Manipulation: Harder Flipping Coins

Recently, I saw this problem from CodeChef titled 'Flipping Coins' (Link: FLIPCOINS).
Summarily, there are N coins and we must write a program that supports two operations.
To flip coin in range [A,B]
To find the number of heads in range [A,B] respectively.
Of course, we can quickly use a segment tree (range query, range updates using lazy propagation) to solve this.
However, I faced another similar problem where after a series of flips (operation 1), we are required to output the resulting permutation of coins after the flips (e.g 100101, where 0 represents head while 1 represents tail).
More specifically, operation 2 changes from counting number of heads to producing the resulting permutation of all N coins. Also, the new operation 2 is only called after all the flips have been done (i.e operation 2 is the last to be called and is only called one time).
May I know how does one solve this? It requires some form of bit manipulation, according to the problem tags.
Edit
I attempted brute-forcing through all queries, and alas, it yield Time Limit Exceeded.
Printing out the state of the coins can be done using a Binary-indexed tree:
Initially all values are 0.
When we need to flip coins [A, B], we increment A by 1 and
decrement B + 1 by 1.
The state of coin i is then the prefix sum at i modulo 2.
This works because the prefix sum at i is always the number of flip operations done at i.

The mean of the mean of the Xn combinations by n. is the mean of the Xn

I have X1...X6. I have taken the combinations by two. For each of those sub-samples I have taken the mean, and then the mean of all of those means:
[(X1+X2)/2 + ... +(X5+X6)/2]/15, where 15 is the total number of combinations.
Now the mean of all of those sub-samples is equal to the mean of :
(X1+X2+X3+X4+X5+X6)/6 .
I am asking for some help in order to either PROVE it (as a generalazation), or why this happens? Because even if I increase the combinations for example the combinations of 6 by 3 or 4 etc the results are the same.
Thank you
OK, here's a quick page of scribbles that shows that no matter how many items you have if you take the mean of all combinations of 2 pairs and then take the mean of those means then you will always get the mean of the original sum.
Explanation...
I work out what the number of combinations is first. For later use.
Then it's just a matter of simplifying the calculation.
Each number is used n-1 times. X1 is obvious. X2 is used n-2 times but also used once in the sum with X1. (This bit is a bit harder with r > 2)
At the end I substitute in the actual values for the number of combinations.
This then cancels out to give the sum of all the numbers over n. Which is the mean.
The next step is to show this for all values r but that shouldn't be too hard.
Substituting r instead of 2. I found that each number is used (n-1) choose (r-1) times.
But then I'm getting the wrong cancellation out of it.
I know where I went wrong... I miscalculated the calculation for (n-1)choose(r-1)
With the correct formula the answer falls out to S/n.

Possible combination to form a given number

How can I check that a given number can be formed by the positive integral combination of a given list of numbers.
For example, if the list of number is,
5 3 9
and
13
Then 13 can be formed by, 5*2 + 3. What is the possible algo for this? This is not a HW question. This was asked in an interview which I am preparing for. Please help!
I did this decades ago for combos of six numbers, (Countdown numbers game). If the set of numbers is in a global array, all you need to pass down through each recursion is one integer index that describes how far along the array you have examined so far.

smith waterman algorithm choose more than one alignment

I want to align a small sequence S1 to another larger nucleotide sequence S2 for example:
S1: acgtgt
S2: ttcgtgacagt...
In this example s1 hit in 2 places in s2 : cgtg and acgt with gap in s2 the 2. I want to use smith waterman algorithm but my question is : in case the 2 alignments have 2 diffrent score i.e one 4 and another 3 how to get the2 alignments from the dynamic programimg matrix? Is there any tool or library that do this already? I tried paorwise2 from biopython and it only gives the alignments with high score in tje matrix
Pairwise alignment algorithms such as Smith-Waterman will only provide the one best alignment. A worse alignment will have a different traceback walk that will not be followed by the Dynamic Programming algorithm Smith-Waterman uses.
If there are multiple alignments with the same best score, S-W will choose only one of those alignments (which one is implementation specific since it doesn't really matter since they have the same score).
If you really really wanted to have multiple alignments returned AND use something like Smith-Waterman, you will have to re-align the sequences multiple times each time configuring the gap penalties differently. I do not recommend this since it will be very expensive.
Instead of using Smith-Waterman, you may want to try something like BLAST which will give you multiple hits
see section Repeated matches in the Durbin - Biological Sequence Analysis
Let us assume that we are only interested in matches scoring higher
than some threshold T . This will be true in general, because there
are always short local alignments with small positive scores even
between entirely unrelated sequences. Let y be the sequence containing
the domain or motif, and x be the sequence in which we are looking for
multiple matches.
An example of the repeat algorithm is given in Figure 2.7. We again
use the matrix F, but the recurrence is now different, as is the
meaning of F(i, j). In the final alignment, x will be partitioned into
regions that match parts of y in gapped alignments, and regions that
are unmatched. We will talk about the score
of a completed match region as being its standard gapped alignment
score minus the threshold T . All these match scores will be positive.
F(i, j) for j ≥ 1 is now the best sum of match scores to x1...i,
assuming that xi is in a matched region, and the corresponding match
ends in xi and yj (they may not actually be aligned, if this is a
gapped section of the match). F(i,0) is the best sum of completed
match scores to the subsequence x1...i, i.e. assuming that xi is in an
unmatched region.
To achieve the desired goal, we start by
initialising F(0,0) = 0 as usual, and then fill the matrix using the
following recurrence relations:
Equation (2.11) handles unmatched regions and ends of matches, only
allowing matches to end when they have score at least T . Equation
(2.12) handles starts of matches and extensions. The total score of
all the matches is obtained by adding an extra cell to the matrix, F(n
+ 1,0), using (2.11). This score will have T subtracted for each match; if there were no matches of score greater than T it will be 0,
obtained by repeated application of the first option in (2.11).
The
individual match alignments can be obtained by tracing back from cell
(n + 1,0) to (0,0), at each point going back to the cell that was the
source of the score in the current cell in the max() operation. This
traceback procedure is a global procedure, showing what each residue
in x will be aligned to. The resulting global alignment will contain
sections of more conventional gapped local alignments of subsequences
of x to subsequences of y.
Note that the algorithm obtains all the
local matches in one pass. It finds the maximal scoring set of
matches, in the sense of maximising the combined total of the excess
of each match score above the threshold T . Changing the value of T
changes what the algorithm finds. Increasing T may exclude matches.
Decreasing it may split them, as well as finding new weaker ones. A
locally optimal match in the sense of the preceding section will be
split into pieces if it contains internal subalignments scoring less
than −T . However, this may be what is wanted: given two similar high
scoring sections significant in their own right, separated by a
non-matching section with a strongly negative score, it is not clear
whether it is preferable to report one match or two.
All possible alignments that conform to the scoring in the substitution matrix are rerpresented in the trace back matrix T - its just that some implementations might not give you access to T.
To extract multiple alignments, you'll need first to look at the scoring matrix H and choose which scores you want to trace back - for example, you might look at the highest 10 scores. The values in the matrix T will tell you the route to trace back. Keep going until the corresponding score in H is zero.
Be careful though - the 10 highest scores might all be part of the same alignment, in which case you'd just get a result that are subsequences of another result. To avoid this, it's probably best to trace back the highest scoring alignment first, and then look for high values in cells that are not passed through by the first alignment.

Fastest way to find sum of digits on big numbers

I have some big numbers (again) and i need to find if the sum of the digits is an even number.
I tried this: finding the sum of the digits with a while loop and then checking if that sum % 2 equals 0 and it's working but it's too slow for big numbers, because i am given intervals of numbers and if the input is 1999999 19999999999 then my program fails, i cannot complete within the time limit which is 0,1 sec.
What to do ? Is there any other faster way to do this ?
EDIT: The input 1999999 19999999999 means it will start with 1999999 and check all the numbers like i wrote above until 19999999999, and because we are talking about big numbers (< 2^30) my program is not worthy.
You don't need to sum the digits. Think about it. The sum starts with zero, which is generally regarded as even (although you can special case this if you want).
Each even digit changes nothing. If the sum was odd, it stays odd, if it was even it stays even.
Each odd digit changes the sum from even to odd, or odd to even.
So, just count the number of odd digits. If the number is even, then the sum of all the digits is even. If the number is odd, then the sum of all the digits is odd.
Now, you only need to do this for the FIRST number in your range. What you need to do next is figure out how the evenness or oddness of the numbers change as you keep adding one.
I leave this as an exercise for the reader. Homework has to involve some work!
Hint: if you find that the sum of the digits of a given number n is odd, will the sum of the digits of the number n + 1 be odd or even?
Update: as #Mark pointed out, it is not so simple... but the anomalies appear only when n + 1 is a multiple of 10, i.e. (n + 1) % 10 == 0. Then the oddity does not change. However, out of these cases, every 10th is an exception when the oddity does change still (e.g. 199 -> 200). And so on... basically, depending on where the highest value 9 of n is, one can decide whether or not the oddity changes between n and n + 1. I admit it is a bit tedious to calculate, but still I am sure it is faster than just adding up all these digits...
Here is a hint, it may work -- you don't need to sum the digits you just need to know if the result will be odd or even -- if you start with the assumption your total is even, even numbers have no effect, odd number toggle (ie an odd number of odd digits make it odd).
Depending on the language there may be a faster way to perform the calculation without adding.
Also remember -- a number is odd or even based on its last binary digit.
Example:
In ASM you could XOR the low order bit to get the correct result
In FORTH this would not work so well...