Related
I am working on a cryptocurrency and there is a calculation that nodes must make:
average /= total;
double ratio = average/DESIRED_BLOCK_TIME_SEC;
int delta = -round(log2(ratio));
It is required that every node has the exact same result no matter what architecture or stdlib being used by the system. My understanding is that log2 might have different implementations that yield very slightly different results or flags like --ffast-math could impact the outputted results.
Is there a simple way to convert the above calculation to something that is verifiably portable across different architectures (fixed point?) or am I overthinking the precision that is needed (given that I round the answer at the end).
EDIT: Average is a long and total is an int... so average ends up rounded to the closest second.
DESIRED_BLOCK_TIME_SEC = 30.0 (it's a float) that is #defined
For this kind of calculation to be exact, one must either calculate all the divisions and logarithms exactly -- or one can work backwards.
-round(log2(x)) == round(log2(1/x)), meaning that one of the divisions can be turned around to get (1/x) >= 1.
round(log2(x)) == floor(log2(x * sqrt(2))) == binary_log((int)(x*sqrt(2))).
One minor detail here is, if (double)sqrt(2) rounds down, or up. If it rounds up, then there might exist one or more value x * sqrt2 == 2^n + epsilon (after rounding), where as if it would round down, we would get 2^n - epsilon. One would give the integer value of n the other would give n-1. Which is correct?
Naturally that one is correct, whose ratio to the theoretical mid point x * sqrt(2) is smaller.
x * sqrt(2) / 2^(n-1) < 2^n / (x * sqrt(2)) -- multiply by x*sqrt(2)
x^2 * 2 / 2^(n-1) < 2^n -- multiply by 2^(n-1)
x^2 * 2 < 2^(2*n-1)
In order of this comparison to be exact, x^2 or pow(x,2) must be exact as well on the boundary - and it matters, what range the original values are. Similar analysis can and should be done while expanding x = a/b, so that the inexactness of the division can be mitigated at the cost of possible overflow in the multiplication...
Then again, I wonder how all the other similar applications handle the corner cases, which may not even exist -- and those could be brute force searched assuming that average and total are small enough integers.
EDIT
Because average is an integer, it makes sense to tabulate those exact integer values, which are on the boundaries of -round(log2(average)).
From octave: d=-round(log2((1:1000000)/30.0)); find(d(2:end) ~= find(d(1:end-1))
1 2 3 6 11 22 43 85 170 340 679 1358 2716
5431 10862 21723 43445 86890 173779 347558 695115
All the averages between [1 2( -> 5
All the averages between [2 3( -> 4
All the averages between [3 6( -> 3
..
All the averages between [43445 86890( -> -11
int a = find_lower_bound(average, table); // linear or binary search
return 5 - a;
No floating point arithmetic needed
I am finding pow(2,i) where i can range: 0<=i<=100000.
Apart i have MOD=1000000007
powers[100000];
powers[0]=1;
for (i = 1; i <=100000; ++i)
{
powers[i]=(powers[i-1]*2)%MOD;
}
for i=100000 won't power value become greater than MOD ?
How do I store the power correctly?
The operation doesn't look feasible to me.
I am getting correct value up to i=70 max I guess.
I have to find sum+= ar[i]*power(2,i) and finally print sum%1000000007 where ar[i] is an additional array with some big numbers up to 10^5
As long as your modulus value is less than half the capacity of your data type, it will never be exceeded. That's because you take the previous value in the range 0..1000000006, double it, then re-modulo it bringing it back to that same range.
However, I can't guarantee that higher values won't cause you troubles, it's more mathematical analysis than I'm prepared to invest given the simple alternative. You could spend a lot of time analysing, checking and debugging, but it's probably better just to not allow the problem to occur in the first place.
The alternative? I'd tend to use the pre-generation method (having a program do the gruntwork up front, inserting the pre-generated values into an array easily and speedily accessible from your real program).
With this method, you can use tools that are well tested and known to work with massive values. Since this data is not going to change, it's useless calculating it every time your program starts.
If you want an easy (and efficient) way to do this, the following bash script in conjunction with bc and awk can do this:
#!/usr/bin/bash
bc >nums.txt <<EOF
i = 1;
for (x = 0;x <= 10000; x++) {
i % 1000000007;
i = i * 2;
}
EOF
awk 'BEGIN { printf "static int array[] = {" }
{ if (NR % 5 == 1) printf "\n ";
printf "%s, ",$0;
next
}
END { print "\n};" }' nums.txt
The bc part is the "meat" of the matter, it creates the large powers of two and outputs them modulo the number you provided. The awk part is simply to format them in C-style array elements, five per line.
Just take the output of that and put it into your code and, voila, there you have it, a compile-time-expensed array that you can use for fast lookup.
It takes only a second and a half on my box to generate the array and then you never need to do it again. You also won't have to concern yourself with the vagaries of modulo math :-)
static int array[] = {
1,2,4,8,16,
32,64,128,256,512,
1024,2048,4096,8192,16384,
32768,65536,131072,262144,524288,
1048576,2097152,4194304,8388608,16777216,
33554432,67108864,134217728,268435456,536870912,
73741817,147483634,294967268,589934536,179869065,
359738130,719476260,438952513,877905026,755810045,
511620083,23240159,46480318,92960636,185921272,
371842544,743685088,487370169,974740338,949480669,
898961331,797922655,595845303,191690599,383381198,
766762396,533524785,67049563,134099126,268198252,
536396504,72793001,145586002,291172004,582344008,
164688009,329376018,658752036,317504065,635008130,
270016253,540032506,80065005,160130010,320260020,
640520040,281040073,562080146,124160285,248320570,
:
861508356,723016705,446033403,892066806,784133605,
568267203,136534399,273068798,546137596,92275185,
184550370,369100740,738201480,476402953,952805906,
905611805,
};
If you notice that your modulo can be stored in int. MOD=1000000007(decimal) is equivalent of 0b00111011100110101100101000000111 and can be stored in 32 bits.
- i pow(2,i) bit representation
- 0 1 0b00000000000000000000000000000001
- 1 2 0b00000000000000000000000000000010
- 2 4 0b00000000000000000000000000000100
- 3 8 0b00000000000000000000000000001000
- ...
- 29 536870912 0b00100000000000000000000000000000
Tricky part starts when pow(2,i) is grater than your MOD=1000000007, but if you know that current pow(2,i) will be greater than your MOD, you can actually see how bits look like after MOD
- i pow(2,i) pow(2,i)%MOD bit representation
- 30 1073741824 73741817 0b000100011001010011000000000000
- 31 2147483648 147483634 0b001000110010100110000000000000
- 32 4294967296 294967268 0b010001100101001100000000000000
- 33 8589934592 589934536 0b100011001010011000000000000000
So if you have pow(2,i-1)%MOD you can do *2 actually on pow(2,i-1)%MOD till you're next pow(2,i) will be greater than MOD.
In example for i=34 you will use (589934536*2) MOD 1000000007 instead of (8589934592*2) MOD 1000000007, because 8589934592 can't be stored in int.
Additional you can try bit operations instead of multiplication for pow(2,i).
Bit operation same as multiplication for 2 is bit shift left.
I have millions of unstructured 3D vectors associated with arbitrary values - making for a set 4D of vectors. To make it simpler to understand: I have unixtime stamps associated with hundreds of thousands of 3D vectors. And I have many time stamps, making for a very large dataset; upwards of 30 millions vectors.
I have the need to search particular datasets of specific time stamps.
So lets say I have the following data:
For time stamp 1407633943:
(0, 24, 58, 1407633943)
(9, 2, 59, 1407633943)
...
For time stamp 1407729456:
(40, 1, 33, 1407729456)
(3, 5, 7, 1407729456)
...
etc etc
And I wish to make a very fast query along the lines of:
Query Example 1:
Give me vectors between:
X > 4 && X < 9 && Y > -29 && Y < 100 && Z > 0.58 && Z < 0.99
Give me list of those vectors, so I can find the timestamps.
Query Example 2:
Give me vectors between:
X > 4 && X < 9 && Y > -29 && Y < 100 && Z > 0.58 && Z < 0.99 && W (timestamp) = 1407729456
So far I've used SQLite for the task, but even after column indexing, the thing takes between 500ms - 7s per query. I'm looking for somewhere between 50ms-200ms per query solution.
What sort of structures or techniques can I use to speed the query up?
Thank you.
kd-trees can be helpful here. Range search in a kd-tree is a well-known problem. Time complexity of one query depends on the output size, of course(in the worst case all tree will be traversed if all vectors fit). But it can work pretty fast on average.
I would use octree. In each node I would store arrays of vectors in a hashtable using the timestamp as a key.
To further increase the performance you can use CUDA, OpenCL, OpenACC, OpenMP and implement the algorithms to be executed in parallel on the GPU or a multi-core CPU.
BKaun: please accept my attempt at giving you some insight into the problem at hand. I suppose you have thought of every one of my points, but maybe seeing them here will help.
Regardless of how ingest data is presented, consider that, using the C programming language, you can reduce the storage size of the data to minimize space and search time. You will be searching for, loading, and parsing single bits of a vector instead of, say, a SHORT INT which is 2 bytes for every entry - or a FLOAT which is much more. The object, as I understand it, is to search the given data for given values of X, Y, and Z and then find the timestamp associated with these 3 while optimizing the search. My solution does not go into the search, but merely the data that is used in a search.
To illustrate my hints simply, I'm considering that the data consists of 4 vectors:
X between -2 and 7,
Y between 0.17 and 3.08,
Z between 0 and 50,
timestamp (many of same size - 10 digits)
To optimize, consider how many various numbers each vector can have in it:
1. X can be only 10 numbers (include 0)
2. Y can be 3.08 minus 0.17 = 2.91 x 100 = 291 numbers
3. Z can be 51 numbers
4. timestamp can be many (but in this scenario,
you are not searching for a certain one)
Consider how each variable is stored as a binary:
1. Each entry in Vector X COULD be stored in 4 bits, using the first bit=1 for
the negative sign:
7="0111"
6="0110"
5="0101"
4="0100"
3="0011"
2="0010"
1="0001"
0="0000"
-1="1001"
-2="1010"
However, the original data that you are searching through may range
from -10 to 20!
Therefore, adding another 2 bits gives you a table like this:
-10="101010"
-9="101001" ...
...
-2="100010"
-1="100001" ...
...
8="001000"
9="001001" ...
...
19="001001"
20="010100"
And that's only 6 bits to store each X vector entry for integers from -10 to 20
For search purposes on a range of -10 to 20, there are 21 different X Vector entries
possible to search through.
Each entry in Vector Y COULD be stored in 9 bits (no extra sign bit is needed)
The 1's and 0's COULD be stored (accessed, really) in 2 parts
(tens place, and a 2 digit decimal).
Part 1 can be 0, 1, 2, or 3 (4 2-place bits from "00" to "11")
However, if the range of the entire Y dataset is 0 to 10,
part 1 can be 0, 1, ...9, 10 (which is 11 4-place bits
from "0000" to "1010"
Part 2 can be 00, 01,...98, 99 (100 7-place bits from "0000000" to "1100100"
Total storage bits for Vector Y entries is 11 + 7 = 18 bits in the
range 00.00 to 10.99
For search purposes on a range 00.00 to 10.99, there are 1089 different Y Vector
entries possible to search through (11x99) (?)
Each entry in Vector Z in the range of 0 to 50 COULD be stored in 6 bits
("000000" to "110010").
Again, the actual data range may be 7 bits long (for simplicity's sake)
0 to 64 ("0000000" to "1000000")
For search purposes on a range of 0 to 64, there are 65 different Z Vector entries
possible to search through.
Consider that you will be storing the data in this optimized format, in a single
succession of bits:
X=4 bits + 2 range bits = 6 bits
+ Y=4 bits part 1 and 7 bits part 2 = 11 bits
+ Z=7 bits
+ timestamp (10 numbers - each from 0 to 9 ("0000" to "1001") 4 bits each = 40 bits)
= TOTAL BITS: 6 + 11 + 7 + 40 = 64 stored bits for each 4D vector
THE SEARCH:
Input xx, yy, zz to search for in arrays X, Y and Z (which are stored in binary)
Change xx, yy, and zz to binary bit strings per optimized format above.
function(xx, yy, zz)
Search for X first, since it has 21 possible outcomes (range is -10 to 10)
- the lowest number of any array
First search for positive targets (there are 8 of them and better chance
of finding one)
These all start with "000"
7="000111"
6="000110"
5="000101"
4="000100"
3="000011"
2="000010"
1="000001"
0="000000"
So you can check if the first 3 bits = "000". If so, you have a number
between 0 and 7.
Found: search for Z
Else search for xx=-2 or -1: does X = -2="100010" or -1="100001" ?
(do second because there are only 2 of them)
Found: Search for Z
NotFound: next X
Search for Z after X is Found: (Z second, since it has 65 possible outcomes
- range is 0 to 64)
You are searching for 6 bits of a 7 bit binary number
("0000000" to "1000000") If bits 1,2,3,4,5,6 are all "0", analyze bit 0.
If it is "1" (it's 64), next Z
Else begin searching 6 bits ("000000" to "110010") with LSB first
Found: Search for Y
NotFound: Next X
Search for Y (Y last, since it has 1089 possible outcomes - range is 0.00 to 10.99)
Search for Part 1 (decimal place) bits (you are searching for
"0000", "0001" or "0011" only, so use yyPt1=YPt1)
Found: Search for Part 2 ("0000000" to "1100100") using yyPt2=YPt2
(direct comparison)
Found: Print out X, Y, Z, and timestamp
NotFound: Search criteria for X, Y, and Z not found in data.
Print X,Y,Z,"timestamp not found". Ask for new X, Y, Z. New search.
Recently I found this interesting thing in webkit sources, related to color conversions (hsl to rgb):
http://osxr.org/android/source/external/webkit/Source/WebCore/platform/graphics/Color.cpp#0111
const double scaleFactor = nextafter(256.0, 0.0); // it's here something like 255.99999999999997
// .. some code skipped
return makeRGBA(static_cast<int>(calcSomethingFrom0To1(blablabla) * scaleFactor),
Same I found here: http://www.filewatcher.com/p/kdegraphics-4.6.0.tar.bz2.5101406/kdegraphics-4.6.0/kolourpaint/imagelib/effects/kpEffectHSV.cpp.html
(int)(value * 255.999999)
Is it correct to use such technique at all? Why dont' use something straight like round(blabla * 255)?
Is it features of C/C++? As I see strictly speaking is will return not always correct results, in 27 cases of 100. See spreadsheet at https://docs.google.com/spreadsheets/d/1AbGnRgSp_5FCKAeNrELPJ5j9zON9HLiHoHC870PwdMc/edit?usp=sharing
Somebody pls explain — I think it should be something basic.
Normally we want to map a real value x in the (closed) interval [0,1] to an integer value j in the range [0 ...255].
And we want to do it in a "fair" way, so that, if the reals are uniformly distributed in the range, the discrete values will be approximately equiprobable: each of the 256 discrete values should get "the same share" (1/256) from the [0,1] interval. That is, we want a mapping like this:
[0 , 1/256) -> 0
[1/256, 2/256) -> 1
...
[254/256, 255/256) -> 254
[255/256, 1] -> 255
We are not much concerned about the transition points [*], but we do want to cover the full the range [0,1]. How to accomplish that?
If we simply do j = (int)(x *255): the value 255 would almost never appear (only when x=1); and the rest of the values 0...254 would each get a share of 1/255 of the interval. This would be unfair, regardless of the rounding behaviour at the limit points.
If we instead do j = (int)(x * 256): this partition would be fair, except for a sngle problem: we would get the value 256 (out of range!) when x=1 [**]
That's why j = (int)(x * 255.9999...) (where 255.9999... is actually the largest double less than 256) will do.
An alternative implementation (also reasonable, almost equivalent) would be
j = (int)(x * 256);
if(j == 256) j = 255;
// j = x == 1.0 ? 255 : (int)(x * 256); // alternative
but this would be more clumsy and probably less efficient.
round() does not help here. For example, j = (int)round(x * 255) would give a 1/255 share to the integers j=1...254 and half that value to the extreme points j=0, j=255.
[*] I mean: we are not extremely interested in what happens in the 'small' neighbourhood of, say, 3/256: rounding might give 2 or 3, it doesn't matter. But we are interested in the extrema: we want to get 0 and 255, for x=0 and x=1respectively.
[**] The IEEE floating point standard guarantees that there's no rounding ambiguity here: integers admit an exact floating point representation, the product will be exact, and the casting will give always 256. Further, we are guaranteed that 1.0 * z = z.
In general, I'd say (int)(blabla * 255.99999999999997) is more correct than using round().
Why?
Because with round(), 0 and 255 only have "half" the range that 1-254 do. If you round(), then 0-0.00196078431 get mapped to 0, while 0.00196078431-0.00588235293 get mapped to 1. This means that 1 has 200% more probability of occurring than 0, which is, strictly speaking, an unfair bias.
If, isntead, one multiplies by 255.99999999999997 and then floors (which is what casting to an integer does, since it truncates), then each integer from 0 to 255 are equally likely.
Your spreadsheet might show this better if it counted in fractional percentages (i.e. if it counted by 0.01% instead of 1% each time). I've made a simple spreadsheet to show this. If you look at that spreadsheet, you'll see that 0 is unfairly biased against when round()ing, but with the other method things are fair and equal.
Casting to int has the same effect as the floor function (i.e. it truncates). When you call round it, well, rounds to the nearest integer.
They do different things, so choose the one you need.
I have hundreds of thousands of sparse bit strings of length 32 bits.
I'd like to do a nearest neighbour search on them and look-up performance is critical. I've been reading up on various algorithms but they seem to target text strings rather than binary strings. I think either locally sensitive hashing or spectral hashing seem good candidates or I could look into compression. Will any of these work well for my bit string problem ? Any direction or guidance would be greatly appreciated.
Here's a fast and easy method,
then a variant with better performance at the cost of more memory.
In: array Uint X[], e.g. 1M 32-bit words
Wanted: a function near( Uint q ) --> j with small hammingdist( q, X[j] )
Method: binary search q in sorted X,
then linear search a block around that.
Pseudocode:
def near( q, X, Blocksize=100 ):
preprocess: sort X
Uint* p = binsearch( q, X ) # match q in leading bits
linear-search Blocksize words around p
return the hamming-nearest of these.
This is fast --
Binary search 1M words
+ nearest hammingdist in a block of size 100
takes < 10 us on my Mac ppc.
(This is highly cache-dependent — your mileage will vary.)
How close does this come to finding the true nearest X[j] ?
I can only experiment, can't do the math:
for 1M random queries in 1M random words,
the nearest match is on average 4-5 bits away,
vs. 3 away for the true nearest (linear scan all 1M):
near32 N 1048576 Nquery 1048576 Blocksize 100
binary search, then nearest +- 50
7 usec
distance distribution: 0 4481 38137 185212 443211 337321 39979 235 0
near32 N 1048576 Nquery 100 Blocksize 1048576
linear scan all 1048576
38701 usec
distance distribution: 0 0 7 58 35 0
Run your data with blocksizes say 50 and 100
to see how the match distances drop.
To get even nearer, at the cost of twice the memory,
make a copy Xswap of X with upper / lower halfwords swapped,
and return the better of
near( q, X, Blocksize )
near( swap q, Xswap, Blocksize )
With lots of memory, one can use many more bit-shuffled copies of X,
e.g. 32 rotations.
I have no idea how performance varies with Nshuffle and Blocksize —
a question for LSH theorists.
(Added): To near-match bit strings of say 320 bits, 10 words,
make 10 arrays of pointers, sorted on word 0, word 1 ...
and search blocks with binsearch as above:
nearest( query word 0, Sortedarray0, 100 ) -> min Hammingdist e.g. 42 of 320
nearest( query word 1, Sortedarray1, 100 ) -> min Hammingdist 37
nearest( query word 2, Sortedarray2, 100 ) -> min Hammingdist 50
...
-> e.g. the 37.
This will of course miss near-matches where no single word is close,
but it's very simple, and sort and binsearch are blazingly fast.
The pointer arrays take exactly as much space as the data bits.
100 words, 3200 bits would work in exactly the same way.
But: this works only if there are roughly equal numbers of 0 bits and 1 bits,
not 99 % 0 bits.
I just came across a paper that addresses this problem.
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering (Ravichandran et al, 2005)
The basic idea is similar to Denis's answer (sort lexicographically by different permutations of the bits) but it includes a number of additional ideas and further references for articles on the topic.
It is actually implemented in https://github.com/soundcloud/cosine-lsh-join-spark which is where I found it.