Intersection between vectors [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have data in the following form.
vector<pair<unsigned,unsigned> > vecA; //here first value denotes reolution and second value denotes value. All values are of 4 bits
vecA.push_back(make_pair(2,2)); vecA.push_back(make_pair(2,3)); vecA.push_back(make_pair(3,6)); vecA.push_back(make_pair(3,7)); vecA.push_back(make_pair(4,5));
(2,2)-> signifies that the first 2 bits of value(a 4 bit number) are 10. i.e. the value could be "1000,1001,1010,1011" in binary
(2,3)-> signifies that the first 2 bits of value(a 4 bit number) are 11 i.e. the value could be "1100,1101,1110, 1011" in binary
(3,6)-> signifies that the first 3 bits of value(a 4 bit number) are 110 i.e., the value could be "1100,1101" in binary
(3,7)-> signifies that the first 3 bits of value(a 4 bit number) are 111 i.e., the value could be "1110,1111" in binary
(4,5)-> signifies that the first 4 bits of value(a 4 bit number) are 0101 i.e., the value is "0101" in binary
I have another vector containing the following:
vector<unsigned> vecB; //vecB has a by default resolution of 4. Here too the values are of 4 bits
vecB.push_back(10); vecB.push_back(6); vecB.push_back(13); vecB.push_back(12); vecB.push_back(15); vecB.push_back(5); vecB.push_back(7);
10-> signifies that the 4 bit number is: "1010"
6-> signifies that the 4 bit number is: "0110"
13-> signifies that the 4 bit number is: "1101"
12-> signifies that the 4 bit number is: "1100"
15-> signifies that the 4 bit number is: "1111", etc.
Now the intersection between vecA and vecB should perform a bit level comparison i.e. for 2 bit resolution of vecA just the first two bits of vecB should be seen.
i.e. (2,2) of vecA matches with "10" of vecB
(2,3) of vecA matches with "13,12,15" of vecB
(3,6) of vecA matches with "12,13" of vecB
(3,7) of vecA matches with "15" of vecB
(4,5) matches with "5" of vecB
The intersection should only return the matching values from vecB. i.e. the intersection should return "10,13,12,15,5" as the result.
How can I perform this intersection efficiently in c++?
vector<unsigned> ans;
for(vector<pair<unsigned,unsigned> >::iterator i1=vecA.begin(), l1=vecA.end(); i1!=l1;++i1)
{
for(vector<unsigned>::iterator i2=vecB.begin(),l2=vecB.end();i2!=l2;++i2)
{
if(((*i2)>>(*i1).first)==(*i1).second)
ans.push_back((*i1).second);
}
}

(2,2) represents 10??, where we don't care what ?? are. This is the half-open range 1000 through 1100, aka [2 << 2, (2+1)<<2).
So, produce a set of ranges from the LHS. Anything that overlaps, fuze. You'll have a set of start/finish intervals.
Now sort the RHS. Next, walk through it, keeping track when you enter/exit the LHS intervals. Those that are in the LHS intervals are in the intersection.
The RHS sorting takes O(|RHS| lg |RHS|). The walking takes O(|RHS| + |LHS|).
Making the LHS intervals takes O(|LHS| lg |LHS|) time (including time to sort by start-of-interval). Merging them is a single pass, also O(|LHS|).
So the end result is O(|RHS| lg |RHS| + |LHS| lg |LHS|) time to calculate intersection, instead of O(|RHS| * |LHS|) of your solution above.

Related

How to get rid of "Time Limit Exceeded" [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I try to solve some problems in a online website. And I've a solved a problem with simple C++ it worked well but it sometime throw a "Time limit exceeded" error. How to get rid of this?
Here is the question that I solved
There are two integers A and B. You are required to compute the bitwise AND amongst all natural numbers lying between A and B, both inclusive.
Here is my code.
#include<iostream>
using namespace std;
int main()
{
int t,a,b;
long ans;
cin>>t;
while(t--)
{
cin>>a>>b;
ans=a;
for(int i=a+1; i<=b; i++)
{
ans=ans&i;
}
cout<<ans<<endl;
}
}
If you have two numbers, X and Y, they are represented by a finite sets of bits:
X = Bx(1), ..., Bx(n)
and
Y = By(1), ..., By(n)
The bitwise AND between the two can be computed as
X ^ Y = (Bx(1) ^ By(1)), ..., (Bx(n) ^ By(n))
A B A ^ B
0 0 0
0 1 0
1 0 0
1 1 1
We observe that:
all the bits can be computed separately, we have as many equations as many bits
in a sequence of logical statements, where AND is the operator, the result is 0 if and only if ANY of the items is 0
So, if any numbers are pair, then the last bit is 0. Otherwise, the last bit will be 1. If any number will have a 0 as the penultimate bit, then the result for that bit will be 0. Otherwise it will be 1.
As a general rule, based on the pigeonhole principle, proposed by Dirichlet, if you have enough consecutive (elements) for a given bit, then the result for that bit will be 0. For example, for the very last bit you have two variations, therefore, if you have at least two numbers in your consecutive set, then the last bit will be 0. If we take the very next bit, then you have four variations: 00, 01, 10 and 11. So, if you have at least 3 numbers in your consecutive set, then this bit is 0. For the next bit, you have 8 variations: 000, 001, 010, 011, 100, 101, 110, 111. So, if you have at least 5 numbers in your consecutive set, then this bit is 0.
Now, since we have a simple rule that determines most bits if there are many items, you will end up with a few bits that exceed in their number of variations the rule I have described above. For those bits you can check the first and the last number. If they have the same value for that bit, then that value will be the result, be it 0 or 1.

Traversing all the subarrays of a 2-D array

I have a 2-D array given of size P*Q and I have to answer K questions based on the array. The 2-D array given consists of numbers 0 and 1. For each question, we have to count the maximum size of any square subarray in which no two same elements are adjacent.
For example if P=Q=8 and our given array be
00101010
00010101
10101010
01010101
10101010
01010101
10101010
01010101
Then the question Ki allows us to do Ki number of flips(0's to 1 or 1's to 0.)
Here K=4(number of questions)
1 2 0 10001
Output: 7 8 6 8
I have understood that as for K1=1, we can change the value of array index(1,1) to 1 and get a 7*7 sized valid matrix, the output is 7. If we have Ki>=2 our answer will be 8.
What I think is that we have to maintain an array ans[k] which stores the maximum size of a square sub matrix which is valid. For this, we can start each index of the original array and traverse through its sub-arrays and count the value of maximum size for flip=i if we start from this index. We have to do this for the subarrays starting from each index and then store the maximum of them in flip[i].
I have problems implementing this as I don't know how to traverse all the sub-arrays for a given index. I'm trying this for so long but still not achieving it. Can anyone please help?
It helps to simplify the problem to depend only on individual values (rather than pairs of neighboring values). So XOR the grid with each perfect checkerboard:
01111111 10000000
10111111 01000000
11111111 00000000
11111111 00000000
11111111 00000000
11111111 00000000
11111111 00000000
11111111 00000000
where the goal is now to find the largest square in either grid that has no more than K_i 0s (obviously favoring the left one here).
Start with K_i=0. To find the largest square of 1s, compute for each cell the number of 1s in a row and a column starting at it (0 for a cell that contains a 0); the largest square with that cell as its upper-left corner (assuming it's a 1) is then one more than the minimum of the row length of its right neighbor, the column length of its lower neighbor, and the square-size of its lower-right neighbor. (All these are 0 for the non-existent cells outside the grid.) Visit the cells in diagonal-major order to have these values available when you need them; note the largest square size produced.
To generalize to K_i>0, store for each cell those three values (row length, column length, and square size) for each number of flips up to K_i. A cell with a 1 adds 1 to each row/column length as before, while a cell with a 0 shifts those lengths to the next flip count, discarding those whose flip count is now too large and adding a new value of 0 for 0 flips. For each combination of row-length-east, column-length-south, and square-size-southeast, each with a flip count, a cell gets a candidate square size that is their minimum with the sum of their flip counts, plus one if the cell is a 0 itself. For each flip count (that isn't too large), keep the largest square size, noting if it is the largest so far encountered (for that flip count).
Note that the brute-force solution may be nearly as fast when the squares are much smaller than the array, since it need visit each one only a small number of times.

Data structure for fast range searches of dense dataset 4D vectors

I have millions of unstructured 3D vectors associated with arbitrary values - making for a set 4D of vectors. To make it simpler to understand: I have unixtime stamps associated with hundreds of thousands of 3D vectors. And I have many time stamps, making for a very large dataset; upwards of 30 millions vectors.
I have the need to search particular datasets of specific time stamps.
So lets say I have the following data:
For time stamp 1407633943:
(0, 24, 58, 1407633943)
(9, 2, 59, 1407633943)
...
For time stamp 1407729456:
(40, 1, 33, 1407729456)
(3, 5, 7, 1407729456)
...
etc etc
And I wish to make a very fast query along the lines of:
Query Example 1:
Give me vectors between:
X > 4 && X < 9 && Y > -29 && Y < 100 && Z > 0.58 && Z < 0.99
Give me list of those vectors, so I can find the timestamps.
Query Example 2:
Give me vectors between:
X > 4 && X < 9 && Y > -29 && Y < 100 && Z > 0.58 && Z < 0.99 && W (timestamp) = 1407729456
So far I've used SQLite for the task, but even after column indexing, the thing takes between 500ms - 7s per query. I'm looking for somewhere between 50ms-200ms per query solution.
What sort of structures or techniques can I use to speed the query up?
Thank you.
kd-trees can be helpful here. Range search in a kd-tree is a well-known problem. Time complexity of one query depends on the output size, of course(in the worst case all tree will be traversed if all vectors fit). But it can work pretty fast on average.
I would use octree. In each node I would store arrays of vectors in a hashtable using the timestamp as a key.
To further increase the performance you can use CUDA, OpenCL, OpenACC, OpenMP and implement the algorithms to be executed in parallel on the GPU or a multi-core CPU.
BKaun: please accept my attempt at giving you some insight into the problem at hand. I suppose you have thought of every one of my points, but maybe seeing them here will help.
Regardless of how ingest data is presented, consider that, using the C programming language, you can reduce the storage size of the data to minimize space and search time. You will be searching for, loading, and parsing single bits of a vector instead of, say, a SHORT INT which is 2 bytes for every entry - or a FLOAT which is much more. The object, as I understand it, is to search the given data for given values of X, Y, and Z and then find the timestamp associated with these 3 while optimizing the search. My solution does not go into the search, but merely the data that is used in a search.
To illustrate my hints simply, I'm considering that the data consists of 4 vectors:
X between -2 and 7,
Y between 0.17 and 3.08,
Z between 0 and 50,
timestamp (many of same size - 10 digits)
To optimize, consider how many various numbers each vector can have in it:
1. X can be only 10 numbers (include 0)
2. Y can be 3.08 minus 0.17 = 2.91 x 100 = 291 numbers
3. Z can be 51 numbers
4. timestamp can be many (but in this scenario,
you are not searching for a certain one)
Consider how each variable is stored as a binary:
1. Each entry in Vector X COULD be stored in 4 bits, using the first bit=1 for
the negative sign:
7="0111"
6="0110"
5="0101"
4="0100"
3="0011"
2="0010"
1="0001"
0="0000"
-1="1001"
-2="1010"
However, the original data that you are searching through may range
from -10 to 20!
Therefore, adding another 2 bits gives you a table like this:
-10="101010"
-9="101001" ...
...
-2="100010"
-1="100001" ...
...
8="001000"
9="001001" ...
...
19="001001"
20="010100"
And that's only 6 bits to store each X vector entry for integers from -10 to 20
For search purposes on a range of -10 to 20, there are 21 different X Vector entries
possible to search through.
Each entry in Vector Y COULD be stored in 9 bits (no extra sign bit is needed)
The 1's and 0's COULD be stored (accessed, really) in 2 parts
(tens place, and a 2 digit decimal).
Part 1 can be 0, 1, 2, or 3 (4 2-place bits from "00" to "11")
However, if the range of the entire Y dataset is 0 to 10,
part 1 can be 0, 1, ...9, 10 (which is 11 4-place bits
from "0000" to "1010"
Part 2 can be 00, 01,...98, 99 (100 7-place bits from "0000000" to "1100100"
Total storage bits for Vector Y entries is 11 + 7 = 18 bits in the
range 00.00 to 10.99
For search purposes on a range 00.00 to 10.99, there are 1089 different Y Vector
entries possible to search through (11x99) (?)
Each entry in Vector Z in the range of 0 to 50 COULD be stored in 6 bits
("000000" to "110010").
Again, the actual data range may be 7 bits long (for simplicity's sake)
0 to 64 ("0000000" to "1000000")
For search purposes on a range of 0 to 64, there are 65 different Z Vector entries
possible to search through.
Consider that you will be storing the data in this optimized format, in a single
succession of bits:
X=4 bits + 2 range bits = 6 bits
+ Y=4 bits part 1 and 7 bits part 2 = 11 bits
+ Z=7 bits
+ timestamp (10 numbers - each from 0 to 9 ("0000" to "1001") 4 bits each = 40 bits)
= TOTAL BITS: 6 + 11 + 7 + 40 = 64 stored bits for each 4D vector
THE SEARCH:
Input xx, yy, zz to search for in arrays X, Y and Z (which are stored in binary)
Change xx, yy, and zz to binary bit strings per optimized format above.
function(xx, yy, zz)
Search for X first, since it has 21 possible outcomes (range is -10 to 10)
- the lowest number of any array
First search for positive targets (there are 8 of them and better chance
of finding one)
These all start with "000"
7="000111"
6="000110"
5="000101"
4="000100"
3="000011"
2="000010"
1="000001"
0="000000"
So you can check if the first 3 bits = "000". If so, you have a number
between 0 and 7.
Found: search for Z
Else search for xx=-2 or -1: does X = -2="100010" or -1="100001" ?
(do second because there are only 2 of them)
Found: Search for Z
NotFound: next X
Search for Z after X is Found: (Z second, since it has 65 possible outcomes
- range is 0 to 64)
You are searching for 6 bits of a 7 bit binary number
("0000000" to "1000000") If bits 1,2,3,4,5,6 are all "0", analyze bit 0.
If it is "1" (it's 64), next Z
Else begin searching 6 bits ("000000" to "110010") with LSB first
Found: Search for Y
NotFound: Next X
Search for Y (Y last, since it has 1089 possible outcomes - range is 0.00 to 10.99)
Search for Part 1 (decimal place) bits (you are searching for
"0000", "0001" or "0011" only, so use yyPt1=YPt1)
Found: Search for Part 2 ("0000000" to "1100100") using yyPt2=YPt2
(direct comparison)
Found: Print out X, Y, Z, and timestamp
NotFound: Search criteria for X, Y, and Z not found in data.
Print X,Y,Z,"timestamp not found". Ask for new X, Y, Z. New search.

Counting number of bits: How does this line work ? n=n&(n-1); [duplicate]

This question already has answers here:
n & (n-1) what does this expression do? [duplicate]
(4 answers)
Closed 6 years ago.
I need some explanation how this specific line works.
I know that this function counts the number of 1's bits, but how exactly this line clears the rightmost 1 bit?
int f(int n) {
int c;
for (c = 0; n != 0; ++c)
n = n & (n - 1);
return c;
}
Can some explain it to me briefly or give some "proof"?
Any unsigned integer 'n' will have the following last k digits: One followed by (k-1) zeroes: 100...0
Note that k can be 1 in which case there are no zeroes.
(n - 1) will end in this format: Zero followed by (k-1) 1's: 011...1
n & (n-1) will therefore end in 'k' zeroes: 100...0 & 011...1 = 000...0
Hence n & (n - 1) will eliminate the rightmost '1'. Each iteration of this will basically remove the rightmost '1' digit and hence you can count the number of 1's.
I've been brushing up on bit manipulation and came across this. It may not be useful to the original poster now (3 years later), but I am going to answer anyway to improve the quality for other viewers.
What does it mean for n & (n-1) to equal zero?
We should make sure we know that since that is the only way to break the loop (n != 0).
Let's say n=8. The bit representation for that would be 00001000. The bit representation for n-1 (or 7) would be 00000111. The & operator returns the bits set in both arguments. Since 00001000 and 00000111 do not have any similar bits set, the result would be 00000000 (or zero).
You may have caught on that the number 8 wasn't randomly chosen. It was an example where n is power of 2. All powers of 2 (2,4,8,16,etc) will have the same result.
What happens when you pass something that is not an exponent of 2? For example, when n=6, the bit representation is 00000110 and n-1=5 or 00000101.The & is applied to these 2 arguments and they only have one single bit in common which is 4. Now, n=4 which is not zero so we increment c and try the same process with n=4. As we've seen above, 4 is an exponent of 2 so it will break the loop in the next comparison. It is cutting off the rightmost bit until n is equal to a power of 2.
What is c?
It is only incrementing by one every loop and starts at 0. c is the number of bits cut off before the number equals a power of 2.

how to find integers over a range which contains different digits [duplicate]

This question already exists:
Closed 10 years ago.
Possible Duplicate:
c++ program to find total numbers of integers containing different digits
Suppose i have a unsigned integer, call it low and one another call it high such that high>low. The problem is to find integers which contains distinct digits over this range. For example, suppose low is 1 and high is 10 then the answer is 10, because all the numbers in this range contains distinct digits. If suppose low is 1 and high is 12, then the answer is 10, because 11 contains same digits.example 123,234,4567 is valid number but 121,2342,4546 is invalid number.I am not looking for a bruteforce algo., if anyone has a better solution then a usual bruteforce approach, please tell..
I would derive an algorithm to determine the number of such digits from 0-n, then you could simply compute (# of valid numbers 0-high) - (# of valid numbers 0-low). To get the valid numbers 0-n, look at the number of digits in your number: if n has 5 digits for example, every valid 1, 2, 3, and 4 digit number is in your result set. So for say a 4 digit number, you compute all the possible combinations of digits in that 4 digit number: 1234, 1235, 1236...5678, 5789, and 6789. Then count the number of permutations (1234 can also be 1243, 1324, 1342...etc), and multiply (# of permutations) x (# of distinct sequences you derived in the previous step). Then you have your answer for all 4 digit numbers. Do the same for each other set, and come up with something more specific for your last set; if high is 5500, you need valid numbers between 5000-5100. You can apply a similar algorithm, but instead of using all digits 0-9, you instead use 9 distinct digits, omitting the '5'. Note that all numbers can have a 0 in them as well, but not at the beginning, so the algorithm would need to account for that as well.
Simply convert your number to a string and then run over it while checking if the given char has already occured in your string. e.g. :
#include <string>
int main()
{
std::string s = std::to_string(12345);
bool occuredCheck[10] = {0}; //automatically fills with zeros. 10 values for the 10 numbers
bool isValidNumber = true;
for(int i=s.length()-1; i>=0; ++i)
if(occuredCheck[s[i] - '0'] ^ true == 0) isValidNumber = false;
}
The if-line set's the array-entry to zero, when a diggit occured twice, see XOR.
And isValidNumber lets you know if it actually is a valid number for you.
BTW: This example needs C++11 for std::to_string.
Using this algorithm you may detect the first invalid number and then set your range using it.