Answering Queries on Binary array - c++

Given a binary array of length <=10^5 and almost equal number of queries. Each query is given by two integers (l,r) for each query we have to computer the total number of consecutive 0's and 1's in the range [l,r].
If n is the length of the array then 1 <= l < r <= n.
For example:
if the binary array (1-indexed) is "011000" and say there are 5 queries:
1 3
5 6
1 5
3 6
3 4
Then the required answer is
1
1
2
2
0
I am aware that this can be solved by a linear time (worst case) algorithm for each query but due to the large number of queries it's not feasible.
Just wondering which is the most efficient way to achieve this?

You can do it with O(n) space complexity and O(log(n)) search time for each query. Calculate the counts for windows of size 1, 2, 4, .... For a given query you can find O(log(n)) windows (at most 2 windows of a particular size), summing which you can find your answer.

As Dukeling said in the comments, you can preprocess in O(n) to compute an array B where B[x] contains the total number of consecutive digits seen in [1..r].
This allows a query in O(1) to find the number of consecutive digits in the range [l,r] by using the array to count the total number in the range [1,r] and subtracting the number in the range [1,l].
Python code:
def preprocess(A):
last=A[0]
B=[0,0]
num_consecutive=0
for a in A[1:]:
if a==last:
num_consecutive+=1
B.append(num_consecutive)
last=a
return B
def query(B,l,r):
return B[r]-B[l]
A=[0,1,1,0,0,0]
B=preprocess(A)
print query(B,1,3)
print query(B,5,6)
print query(B,1,5)
print query(B,3,6)
print query(B,3,4)

Related

Regular Expression for Binary Numbers Divisible by 5

I want to write a regular expression for Binary Numbers Divisible by 5.
I have already done the regular expressions for Binary Numbers Divisible by 2 and for 3 but I couldn't find one for 5.
Any suggestions?
(0|1(10)*(0|11)(01*01|01*00(10)*(0|11))*1)*
Add ^$ to test it with regexp. See it working here.
You can build a DFA and convert it to regular expression. The DFA was already built in another answer. You can read it, it is very well explained.
The general idea is to remove nodes, adding edges.
Becomes:
Using this transformation and the DFA from the answer I linked, here are the steps to get the regular expression:
(EDIT: Note that the labels "Q3" and "Q4" have been mistakenly swapped in the diagram. These states represent the remainder after modulus 5.)
2^0 = 1 = 1 mod 5
2^1 = 2 = 2 mod 5
2^2 = 4 = -1 mod 5
2^3 = 8 = -2 mod 5
2^4 = 16 = 1 mod 5
2^5 = 32 = 2 mod 5
... -1 mod 5
... -2 mod 5
So we have a 1, 2, -1, -2 pattern. There are two subpatterns where only the sign of the number alternates: Let n is the digit number and the number of the least significant digit is 0; odd pattern is
(-1)^(n)
and even pattern is
2x((-1)^(n))
So, how to use this?
Let the original number be 100011, divide the numbers digits into two parts, even and odd. Sum each parts digits separately. Multiply sum of the odd digits by 2. Now, if the result is divisible by sum of the even digits, then the original number is divisible by 5, else it is not divisible. Example:
100011
1_0_1_ 1+0+1 = 2
_0_0_1 0+0+1 = 1; 1x2 = 2
2 mod(2) equals 0? Yes. Therefore, original number is divisible.
How to apply it within a regex? Using callout functions within a regex it can be applied. Callouts provide a means of temporarily passing control to the script in the middle of regular expression pattern matching.
However, ndn's answer is more appropriate and easier, therefore I recommend to use his answer.
However, "^(0|1(10)*(0|11)(01*01|01*00(10)*(0|11))1)$" matches empty string.

how to map a specialized string into specified integer

I am doing some financial trading work. I have a set of stock symbols but they have very clear pattern:
it's composed of two characters AB, AC AD and current month which is a four digit number: 1503, 1504, 1505. Some examples are:
AB1504
AB1505
AC1504
AC1505
AD1504
AD1505
....
Since these strings are so well designed patterned, I want to map (hash) each of the string into a unique integer so that I can use the integer as the array index for fast accessing, since I have a lot of retrievals inside my system and std::unordered_map or any other hash map are not fast enough. I have tests showing that general hash map are hundred-nanoseconds latency level while array indexing is always under 100 nanos.
my ideal case would be, for example, AB1504 maps to integer 1, AB1505
maps to 2...., then I can create an array inside to access the information related to these symbols much faster.
I'm trying to figure out some hash algorithms or other methods that can achieve my goal but couldn't find out.
Do you guys have any suggestions on this problem?
You can regard the string as a variable-base number representation, and convert that to an integer. For example:
AC1504:
A (range: A-Z)
C (range: A-Z)
15 (range: 0-99)
04 (range: 1-12)
Extract the parts; then a hash function could be
int part1, part2, part3, part4;
...
part1 -= 'A';
part2 -= 'A';
part4 -= 1;
return (((part1 * 26 + part2) * 100 + part3) * 12 + part4;
The following values should be representable by a 32-bit integer:
XYnnnn => (26 * X + Y) * 10000 + nnnn
Here X and Y take values in the range [0, 26), and n takes values in the range [0, 10).
You have a total of 6,760,000 representable values, so if you only want to associate a small amount of data with it (e.g. a count or a pointer), you can just make a flat array, where each symbol occupies one array entry.
If you parse the string as a mixed base number, first 2 base-26 digits and then 4 base-10 digits you will quickly get a unique index for each string. The only issue is that if you might get a sparsely populated array.
You can always reorder the digits when calculating the index to minimize the issue mentioned above.
As the numbers are actually months I would calculate the number of months from the first entry and multiply that with the 2 digit base-26 number from the prefix.
Hope you can make some sense from this, typing on my tablet at the moment. :D
I assume the format is 'AAyymm', where A is an uppercase character yy a two digit year and mm the two digit month.
Hence you can map it to 10 (AA) + Y (yy) + 4 (mm) bits. where Y = 32 - 10 - 4 = 18 bits for a 32 bit representation (or 262144 years).
Having that, you can represent the format as an integer by shifting the characters to there place and shifting the year and month pairs to there places after converting these to an integer.
Note: There will always be gaps in the binary representation, Here the 5+5 bit representation for the characters (6 + 6 values) and in the 4 bit month representation (4 values)
To avoid the gaps change the representation to ABmmmm, were the pair AB is represented by a the number 26*A+B and mmmm is the month relative to some zero month in some year (which covers 2^32/1024/12 = 349525 years - having 32 bits).
However, you might consider a split of stock symbols and time. Combining two values in one field is usually troublesome (It might be a good storage format, but no good 'program data format').

Python Range() for positive and negative numbers

Okay, so I'm trying to write a simple program that gives me both the positive
and negative range of the number given by the user.
For example, if the user gives number 3 then the program should prints out
-3 -2 -1 0 1 2 3
I've tried thinking but just can't think how to get the negative range outputs.
But the code I got below only gives me the positive range outputs, so I was
thinking may I need to do to make it gives me both the positive and negative
outputs.
s = int(input("Enter a number "))
for i in range(s+1):
print i+1
Range() can take two parameters: range(start, end), where start is inclusive, and end is exclusive. So the range you want for 3 is: range(-3, 4). To make it general:
s = int(input("Enter a number "))
for i in range(-s, s+1):
print i

Equivalence Classes in software testing

I am new to Software testing and I am studying the basic techniques. I read the following problem:
Identify the Equivalence Classes for the following specification: The program accepts five to nine inputs which are 3 digit integers greater than 100.
I think that it doesn't matter how much inputs this program has, and the equivalence class is {99,100,101}. Am I right or not?
After the comments, I think the classes are:
1.(-00,99)
2.[100]
3.(101,999)
4.(1000,+00)
Inputs:
0-4 inputs
5-9 inputs
More than 9 inputs
Values:
0-100
101-999
Greater than 999
The program accepts when there are between 5 and 9 inputs and each input value is a 3-digit number between 101 and 999.
I suggest you use PICT for generating effective combinations to test.
Take a look at http://msdn.microsoft.com/en-us/magazine/ee819137.aspx
The tool can be downloaded from http://download.microsoft.com/download/f/5/5/f55484df-8494-48fa-8dbd-8c6f76cc014b/pict33.msi
You can look for similar tools at http://pairwise.org/tools.asp
equivalence classes for your problem are:
set of numbers that are not three digit and greater than hundred...
set of numbers that are less than one hundred
set of numbers that are greater than one hundred and less than 999
set consisting of the number 100
1 0<x<100 , value of x should contain 0-4
2 101<x<999 , value of x should contain 5-9
3 X>999 , value of x should be 0-9
Following should be the classes:
Inputs:
[0 - 4] invalid class
[5 - 9] valid class
[More than 9] invalid class
Values:
[Less than 99] invalid class
[100 to 999] valid class
[Greater than 1000] invalid class
Again a Decision table should be used to find out the valid combination of Inputs and Values.

Adding one digit (0-9) to the sequence/string creates new 4 digits number

I'm trying to find an algorithm which "breaks the safe" by typing the keys 0-9. The code is 4 digits long. The safe will be open where it identifies the code as substring of the typing. meaning, if the code is "3456" so the next typing will open the safe: "123456". (It just means that the safe is not restarting every 4 keys input).
Is there an algorithm which every time it add one digit to the sequence, it creates new 4 digits number (new combinations of the last 4 digits of the sequence\string)?
thanks, km.
Editing (I post it years ago):
The question is how to make sure that every time I set an input (one digit) to the safe, I generate a new 4 digit code that was not generated before. For example, if the safe gets binary code with 3 digits long then this should be my input sequence:
0001011100
Because for every input I get a new code (3 digit long) that was not generated before:
000 -> 000
1 -> 001
0 -> 010
1 -> 101
1 -> 011
1 -> 111
0 -> 110
0 -> 100
I found a reduction to your problem:
Lets define directed graph G = (V,E) in the following way:
V = {all possible combinations of the code}.
E = {< u,v > | v can be obtained from u by adding 1 digit (at the end), and delete the first digit}.
|V| = 10^4.
Din and Dout of every vertex equal to 10 => |E| = 10^5.
You need to prove that there is Hamilton cycle in G - if you do, you can prove the existence of a solution.
EDIT1:
The algorithm:
Construct directed graph G as mentioned above.
Calculate Hamilton cycle - {v1,v2,..,vn-1,v1}.
Press every number in v1.
X <- v1.
while the safe isn't open:
5.1 X <- next vertex in the Hamilton path after X.
5.2 press the last digit in X.
We can see that because we use Hamilton cycle, we never repeat the same substring. (The last 4 presses).
EDIT2:
Of course Hamilton path is sufficient.
Here in summary is the problem I think you are trying to solve and some explanation on how i might approach solving it. http://www.cs.swan.ac.uk/~csharold/cpp/SPAEcpp.pdf
You have to do some finessing to make it fit into the chinese post man problem however...
Imagine solving this problem for the binary digits, three digits strings. Assume you have the first two digits, and ask your self what are my options to move to? (In regards to the next two digit string?)
You are left with a Directed Graph.
/-\
/ V
\- 00 ----> 01
^ / ^|
\/ ||
/\ ||
V \ |V
/-- 11 ---> 10
\ ^
\-/
Solve the Chinese Postman, you will have all combinations and will form one string
The question is now, is the Chinese postman solvable? There are algorithms which determine weather or not a DAG is solvable for the CPP, but i don't know if this particular graph is necessarily solvable based on the problem alone. That would be a good thing to determine. You do however know you could find out algorithmically weather it is solvable and if it is you could solve it using algorithms available in that paper (I think) and online.
Every vertex here has 2 incoming edges and 2 outgoing edges.
There are 4 (2^2) vertexes.
In the full sized problem there are 19683( 3 ^ 9 ) vertexs and every vertex has 512 ( 2 ^ 9 ) out going and incoming vertexes. There would be a total of
19683( 3 ^ 9 ) x 512 (2 ^ 9) = 10077696 edges in your graph.
Approach to solution:
1.) Create list of all 3 digit numbers 000 to 999.
2.) Create edges for all numbers such that last two digits of first number match first
two digits of next number.
ie 123 -> 238 (valid edge) 123 -> 128 (invalid edge)
3.) use Chinese Postman solving algorithmic approaches to discover if solvable and
solve
I would create an array of subsequences which needs to be updates upon any insertion of a new digit. So in your example, it will start as:
array = {1}
then
array = {1,2,12}
then
array = {1,2,12,3,13,23,123}
then
array = {1,2,12,3,13,23,123,4,14,24,124,34,134,234,1234}
and when you have a sequence that is already at the length=4 you don't need to continue the concatenation, just remove the 1st digit of the sequence and insert the new digit at the end, for example, use the last item 1234, when we add 5 it will become 2345 as follows:
array = {1,2,12,3,13,23,123,4,14,24,124,34,134,234,1234,5,15,25,125,35,135,235,1235,45,145,245,1245,345,1345,2345,2345}
I believe that this is not a very complicated way of going over all the sub-sequences of a given sequence.