I'm trying to do some analysis over a simple Fraction class and I want some data to compare that type with doubles.
The problem
Right know I'm looking for some good way to get the density of Fractions between 2 numbers. Fractions is basically 2 integers (e.g. pair< long, long>), and the density between s and t is the amount of representable numbers in that range. And it needs to be an exact, or very good approximation done in O(1) or very fast.
To make it a bit simpler, let's say I want all the numbers (not fractions) a/b between s and t, where 0 <= s <= a/b < t <= M, and 0 <= a,b <= M (b > 0, a and b are integers)
Example
If my fractions were of a data type which only count to 6 (M = 6), and I want the density between 0 and 1, the answer would be 12. Those numbers are:
0, 1/6, 1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 5/6.
What I thought already
A very naive approach would be to cycle trough all the possible fractions, and count those which can't be simplified. Something like:
long fractionsIn(double s, double t){
long density = 0;
long M = LONG_MAX;
for(int d = 1; d < floor(M/t); d++){
for(int n = ceil(d*s); n < M; n++){
if( gcd(n,d) == 1 )
density++;
}
}
return density;
}
But gcd() is very slow so it doesn't works. I also try doing some math but i couldn't get to anything good.
Solution
Thanks to #m69 answer, I made this code for Fraction = pair<Long,Long>:
//this should give the density of fractions between first and last, or less.
double fractionsIn(unsigned long long first, unsigned long long last){
double pi = 3.141592653589793238462643383279502884;
double max = LONG_MAX; //i can't use LONG_MAX directly
double zeroToOne = max/pi * max/pi * 3; // = approx. amount of numbers in Farey's secuence of order LONG_MAX.
double res = 0;
if(first == 0){
res = zeroToOne;
first++;
}
for(double i = first; i < last; i++){
res += zeroToOne/(i * i+1);
if(i == i+1)
i = nextafter(i+1, last); //if this happens, i might not count some fractions, but i have no other choice
}
return floor(res);
}
The main change is nextafter, which is important with big numbers (1e17)
The result
As I explain at the begining, I was trying to compare Fractions with double. Here is the result for Fraction = pair<Long,Long> (and here how I got the density of doubles):
Density between 0,1: | 1,2 | 1e6,1e6+1 | 1e14,1e14+1 | 1e15-1,1e15 | 1e17-10,1e17 | 1e19-10000,1e19 | 1e19-1000,1e19
Doubles: 4607182418800017408 | 4503599627370496 | 8589934592 | 64 | 8 | 1 | 5 | 0
Fraction: 2.58584e+37 | 1.29292e+37 | 2.58584e+25 | 2.58584e+09 | 2.58584e+07 | 2585 | 1 | 0
Density between 0 and 1
If the integers with which you express the fractions are in the range 0~M, then the density of fractions between the values 0 (inclusive) and 1 (exclusive) is:
M: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0~(1): 1 2 4 6 10 12 18 22 28 32 42 46 58 64 72 80 96 102 120 128 140 150 172 180 200 212 230 242 270 278 308 ...
This is sequence A002088 on OEIS. If you scroll down to the formula section, you'll find information about how to approximate it, e.g.:
Φ(n) = (3 ÷ π2) × n2 + O[n × (ln n)2/3 × (ln ln n)4/3]
(Unfortunately, no more detail is given about the constants involved in the O[x] part. See discussion about the quality of the approximation below.)
Distribution across range
The interval from 0 to 1 contains half of the total number of unique fractions that can be expressed with numbers up to M; e.g. this is the distribution when M = 15 (i.e. 4-bit integers):
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
72 36 12 6 4 2 2 2 1 1 1 1 1 1 1 1
for a total of 144 unique fractions. If you look at the sequence for different values of M, you'll see that the steps in this sequence converge:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1: 1 1
2: 2 1 1
3: 4 2 1 1
4: 6 3 1 1 1
5: 10 5 2 1 1 1
6: 12 6 2 1 1 1 1
7: 18 9 3 2 1 1 1 1
8: 22 11 4 2 1 1 1 1 1
9: 28 14 5 2 2 1 1 1 1 1
10: 32 16 5 3 2 1 1 1 1 1 1
11: 42 21 7 4 2 2 1 1 1 1 1 1
12: 46 23 8 4 2 2 1 1 1 1 1 1 1
13: 58 29 10 5 3 2 2 1 1 1 1 1 1 1
14: 64 32 11 5 4 2 2 1 1 1 1 1 1 1 1
15: 72 36 12 6 4 2 2 2 1 1 1 1 1 1 1 1
Not only is the density between 0 and 1 half of the total number of fractions, but the density between 1 and 2 is a quarter, and the density between 2 and 3 is close to a twelfth, and so on.
As the value of M increases, the distribution of fractions across the ranges 0-1, 1-2, 2-3 ... converges to:
1/2, 1/4, 1/12, 1/24, 1/40, 1/60, 1/84, 1/112, 1/144, 1/180, 1/220, 1/264 ...
This sequence can be calculated by starting with 1/2 and then:
0-1: 1/2 x 1/1 = 1/2
1-2: 1/2 x 1/2 = 1/4
2-3: 1/4 x 1/3 = 1/12
3-4: 1/12 x 2/4 = 1/24
4-5: 1/24 x 3/5 = 1/40
5-6: 1/40 x 4/6 = 1/60
6-7: 1/60 x 5/7 = 1/84
7-8: 1/84 x 6/8 = 1/112
8-9: 1/112 x 7/9 = 1/144 ...
You can of course calculate any of these values directly, without needing the steps inbetween:
0-1: 1/2
6-7: 1/2 x 1/6 x 1/7 = 1/84
(Also note that the second half of the distribution sequence consists of 1's; these are all the integers divided by 1.)
Approximating the density in given interval
Using the formulas provided on the OEIS page, you can calculate or approximate the density in the interval 0-1, and multiplied by 2 this is the total number of unique values that can be expressed as fractions.
Given two values s and t, you can then calculate and sum the densities in the intervals s ~ s+1, s+1 ~ s+2, ... t-1 ~ t, or use an interpolation to get a faster but less precise approximate value.
Example
Let's assume that we're using 10-bit integers, capable of expressing values from 0 to 1023. Using this table linked from the OEIS page, we find that the density between 0~1 is 318452, and the total number of fractions is 636904.
If we wanted to find the density in the interval s~t = 100~105:
100~101: 1/2 x 1/100 x 1/101 = 1/20200 ; 636904/20200 = 31.53
101~102: 1/2 x 1/101 x 1/102 = 1/20604 ; 636904/20604 = 30.91
102~103: 1/2 x 1/102 x 1/103 = 1/21012 ; 636904/21012 = 30.31
103~104: 1/2 x 1/103 x 1/104 = 1/21424 ; 636904/21424 = 29.73
104~105: 1/2 x 1/104 x 1/105 = 1/21840 ; 636904/21840 = 29.16
Rounding these values gives the sum:
32 + 31 + 30 + 30 + 29 = 152
A brute force algorithm gives this result:
32 + 32 + 30 + 28 + 28 = 150
So we're off by 1.33% for this low value of M and small interval with just 5 values. If we had used linear interpolation between the first and last value:
100~101: 31.53
104~105: 29.16
average: 30.345
total: 151.725 -> 152
we'd have arrived at the same value. For larger intervals, the sum of all the densities will probably be closer to the real value, because rounding errors will cancel each other out, but the results of linear interpolation will probably become less accurate. For ever larger values of M, the calculated densities should converge with the actual values.
Quality of approximation of Φ(n)
Using this simplified formula:
Φ(n) = (3 ÷ π2) × n2
the results are almost always smaller than the actual values, but they are within 1% for n ≥ 182, within 0.1% for n ≥ 1880 and within 0.01% for n ≥ 19494. I would suggest hard-coding the lower range (the first 50,000 values can be found here), and then using the simplified formula from the point where the approximation is good enough.
Here's a simple code example with the first 182 values of Φ(n) hard-coded. The approximation of the distribution sequence seems to add an error of a similar magnitude as the approximation of Φ(n), so it should be possible to get a decent approximation. The code simply iterates over every integer in the interval s~t and sums the fractions. To speed up the code and still get a good result, you should probably calculate the fractions at several points in the interval, and then use some sort of non-linear interpolation.
function fractions01(M) {
var phi = [0,1,2,4,6,10,12,18,22,28,32,42,46,58,64,72,80,96,102,120,128,140,150,172,180,200,212,230,242,270,278,308,
324,344,360,384,396,432,450,474,490,530,542,584,604,628,650,696,712,754,774,806,830,882,900,940,964,1000,
1028,1086,1102,1162,1192,1228,1260,1308,1328,1394,1426,1470,1494,1564,1588,1660,1696,1736,1772,1832,1856,
1934,1966,2020,2060,2142,2166,2230,2272,2328,2368,2456,2480,2552,2596,2656,2702,2774,2806,2902,2944,3004,
3044,3144,3176,3278,3326,3374,3426,3532,3568,3676,3716,3788,3836,3948,3984,4072,4128,4200,4258,4354,4386,
4496,4556,4636,4696,4796,4832,4958,5022,5106,5154,5284,5324,5432,5498,5570,5634,5770,5814,5952,6000,6092,
6162,6282,6330,6442,6514,6598,6670,6818,6858,7008,7080,7176,7236,7356,7404,7560,7638,7742,7806,7938,7992,
8154,8234,8314,8396,8562,8610,8766,8830,8938,9022,9194,9250,9370,9450,9566,9654,9832,9880,10060];
if (M < 182) return phi[M];
return Math.round(M * M * 0.30396355092701331433 + M / 4); // experimental; see below
}
function fractions(M, s, t) {
var half = fractions01(M);
var frac = (s == 0) ? half : 0;
for (var i = (s == 0) ? 1 : s; i < t && i <= M; i++) {
if (2 * i < M) {
var f = Math.round(half / (i * (i + 1)));
frac += (f < 2) ? 2 : f;
}
else ++frac;
}
return frac;
}
var M = 1023, s = 100, t = 105;
document.write(fractions(M, s, t));
Comparing the approximation of Φ(n) with the list of the 50,000 first values suggests that adding M÷4 is a workable substitute for the second part of the formula; I have not tested this for larger values of n, so use with caution.
Blue: simplified formula. Red: improved simplified formula.
Quality of approximation of distribution
Comparing the results for M=1023 with those of a brute-force algorithm, the errors are small in real terms, never more than -7 or +6, and above the interval 205~206 they are limited to -1 ~ +1. However, a large part of the range (57~1024) has fewer than 100 fractions per integer, and in the interval 171~1024 there are only 10 fractions or fewer per integer. This means that small errors and rounding errors of -1 or +1 can have a large impact on the result, e.g.:
interval: 241 ~ 250
fractions/integer: 6
approximation: 5
total: 50 (instead of 60)
To improve the results for intervals with few fractions per integer, I would suggest combining the method described above with a seperate approach for the last part of the range:
Alternative method for last part of range
As already mentioned, and implemented in the code example, the second half of the range, M÷2 ~ M, has 1 fraction per integer. Also, the interval M÷3 ~ M÷2 has 2; the interval M÷4 ~ M÷3 has 4. This is of course the Φ(n) sequence again:
M/2 ~ M : 1
M/3 ~ M/2: 2
M/4 ~ M/3: 4
M/5 ~ M/4: 6
M/6 ~ M/5: 10
M/7 ~ M/6: 12
M/8 ~ M/7: 18
M/9 ~ M/8: 22
M/10 ~ M/9: 28
M/11 ~ M/10: 32
M/12 ~ M/11: 42
M/13 ~ M/12: 46
M/14 ~ M/13: 58
M/15 ~ M/14: 64
M/16 ~ M/15: 72
M/17 ~ M/16: 80
M/18 ~ M/17: 96
M/19 ~ M/18: 102 ...
Between these intervals, one integer can have a different number of fractions, depending on the exact value of M, e.g.:
interval fractions
202 ~ 203 10
203 ~ 204 10
204 ~ 205 9
205 ~ 206 6
206 ~ 207 6
The interval 204 ~ 205 lies on the edge between intervals, because M ÷ 5 = 204.6; it has 6 + 3 = 9 fractions because M modulo 5 is 3. If M had been 1022 or 1024 instead of 1023, it would have 8 or 10 fractions. (This example is straightforward because 5 is a prime; see below.)
Again, I would suggest using the hard-coded values for Φ(n) to calculate the number of fractions for the last part of the range. If you use the first 17 values as listed above, this covers the part of the range with fewer than 100 fractions per integer, so that would reduce the impact of rounding errors below 1%. The first 56 values would give you 0.1%, the first 182 values 0.01%.
Together with the values of Φ(n), you could hard-code the number of fractions of the edge intervals for each modulo value, e.g.:
modulo: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
M/ 2 1 2
M/ 3 2 3 4
M/ 4 4 5 5 6
M/ 5 6 7 8 9 10
M/ 6 10 11 11 11 11 12
M/ 7 12 13 14 15 16 17 18
M/ 8 18 19 19 20 20 21 21 22
M/ 9 22 23 24 24 25 26 26 27 28
M/10 28 29 29 30 30 30 30 31 31 32
M/11 32 33 34 35 36 37 38 39 40 41 42
M/12 42 43 43 43 43 44 44 45 45 45 45 46
M/13 46 47 48 49 50 51 52 53 54 55 56 57 58
M/14 58 59 59 60 60 61 61 61 61 62 62 63 63 64
M/15 64 65 66 66 67 67 67 68 69 69 69 70 70 71 72
M/16 72 73 73 74 74 75 75 76 76 77 77 78 78 79 79 80
M/17 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
M/18 96 97 97 97 97 98 98 99 99 99 99 100 100 101 101 101 101 102
This is exactly the same as: (Sum of phi(k)) where m <= k <= M where phi(k) is the Euler Totient Function and with phi(0) = 1 (as defined by the problem). There is no known closed form for this sum. However there are many optimizations known as mentioned in the wiki link. This is known as the Totient Summatory Function in Wolfram. The same website also links to the series: A002088 and provides a few asymptotic approximations.
The reasoning is this: consider the number of values of the form {1/M, 2/M, ...., (M-1)/M, M/M}. All those fractions that will be reducible to a smaller value will not be counted in phi(M) because they are not relatively prime. They will appear in the summation of another totient.
For example, phi(6) = 12 and you have 1 + phi(6), since you also count the 0.
I'm trying to search a big project for all examples of where I've declared an array with [48] as the size or any multiples of 48.
Can I use a regular expression function to find matches of 48 * n?
Thanks.
Here you go (In PHP's PCRE syntax):
^(0*|(1(01*?0)*?1|0)+?0{4})$
Usage:
preg_match('/^(0*|(1(01*?0)*?1|0)+?0{4})$/', decbin($number));
Now, why it works:
Well we know that 48 is really just 3 * 16. And 16 is just 2*2*2*2. So, any number divisible by 2^4 will have the 4 most bits in its binary representation 0. So by ending the regexp with 0{4}$ is equivalent to saying that the number is divisible by 2^4 (or 16). So then, the bits to the left need to be divisible by 3. So using the regexp from this answer, we can tell if they are divisible by 3. So if the whole regexp matches, the number is divisible by both 3 and 16, and hence 48...
QED...
(Note, the leading 0| case handles the failed match when $number is 0). I've tested this on all numbers from 0 to 48^5, and it correctly matches each time...
A generalization of your question is asking whether x is a string representing a multiple of n in base b. This is the same thing as asking whether the remainder of x divided by n is 0. You can easily create a DFA to compute this.
Create a DFA with n states, numbered from 0 to n - 1. State 0 is both the initial state and the sole accepting state. Each state will have b outgoing transitions, one for each symbol in the alphabet (since base-b gives you b digits to work with).
Each state represents the remainder of the portion of x we've seen so far, divided by n. This is why we have n of them (dividing a number by n yields a remainder in the range 0 to n - 1), and also why state 0 is the accepting state.
Since the digits of x are processed from left to right, if we have a number y from the first few digits of x and read the digit d, we get the new value of y from yb + d. But more importantly, the remainder r changes to (rb + d) mod n. So we now know how to connect the transition arcs and complete the DFA.
You can do this for any n and b. Here, for example, is one that accepts multiples of 18 in base-10 (states on the rows, inputs on the columns):
| 0 1 2 3 4 5 6 7 8 9
---+-------------------------------
→0 | 0 1 2 3 4 5 6 7 8 9 ←accept
1 | 10 11 12 13 14 15 16 17 0 1
2 | 2 3 4 5 6 7 8 9 10 11
3 | 12 13 14 15 16 17 0 1 2 3
4 | 4 5 6 7 8 9 10 11 12 13
5 | 14 15 16 17 0 1 2 3 4 5
6 | 6 7 8 9 10 11 12 13 14 15
7 | 16 17 0 1 2 3 4 5 6 7
8 | 8 9 10 11 12 13 14 15 16 17
9 | 0 1 2 3 4 5 6 7 8 9
10 | 10 11 12 13 14 15 16 17 0 1
11 | 2 3 4 5 6 7 8 9 10 11
12 | 12 13 14 15 16 17 0 1 2 3
13 | 4 5 6 7 8 9 10 11 12 13
14 | 14 15 16 17 0 1 2 3 4 5
15 | 6 7 8 9 10 11 12 13 14 15
16 | 16 17 0 1 2 3 4 5 6 7
17 | 8 9 10 11 12 13 14 15 16 17
These get really tedious as n and b get larger, but you can obviously write a program to generate them for you no problem.
1|48|2304|110592|5308416
You are unlikely to have declared an array of size 48^5 or larger.
No, regular expressions can't calculate multiples (except in the unary number system: decimal 4 = unary 1111; decimal 8 = unary 11111111, so the regex ^(1111)+$ matches multiples of 4).
import re
# For real example,
# construction of a chain with integers multiples of 48
# and integers not multiple of 48.
from random import *
w = [ 48*randint( 1,10) for j in xrange(10) ]
w.extend( 48*randint(11,20) for j in xrange(10) )
w.extend( 48*randint(21,70) for j in xrange(10) )
a = [ el if el%48!=0 else el+1 for el in sample(xrange(1000),40) ]
w.extend(a)
shuffle(w)
texte = [ ''.join(sample(' abcdefghijklmonopqrstuvwxyz',randint(1,7))) for i in xrange(40) ]
X = ''.join(texte[i]+str(w[i]) for i in xrange(40))
# Searching the multiples of 48 in the chain X
def mult48(match):
g1 = match.group()
if int(g1)%48==0:
return ( g1, X[0:match.end()] )
else:
return ( g1, 'not multiple')
for match in re.finditer('\d+',X):
print '%s %s\n' % mult48(match)
Any multiple is difficult, but here's a (python-style) regexp that matches the first 200 multiples of 48.
0$|1(?:0(?:08$|56$)|1(?:04$|52$)|2(?:00$|48$|96$)|3(?:44$|92$)|4(?:4(?:$|0$)|88$\
)|5(?:36$|84$)|6(?:32$|80$)|7(?:28$|76$)|8(?:24$|72$)|9(?:2(?:$|0$)|68$))|2(?:0(\
?:16$|64$)|1(?:12$|60$)|2(?:08$|56$)|3(?:04$|52$)|4(?:0(?:$|0$)|48$|96$)|5(?:44$\
|92$)|6(?:40$|88$)|7(?:36$|84$)|8(?:32$|8(?:$|0$))|9(?:28$|76$))|3(?:0(?:24$|72$\
)|1(?:20$|68$)|2(?:16$|64$)|3(?:12$|6(?:$|0$))|4(?:08$|56$)|5(?:04$|52$)|6(?:00$\
|48$|96$)|7(?:44$|92$)|8(?:4(?:$|0$)|88$)|9(?:36$|84$))|4(?:0(?:32$|80$)|1(?:28$\
|76$)|2(?:24$|72$)|3(?:2(?:$|0$)|68$)|4(?:16$|64$)|5(?:12$|60$)|6(?:08$|56$)|7(?\
:04$|52$)|8(?:$|0(?:$|0$)|48$|96$)|9(?:44$|92$))|5(?:0(?:40$|88$)|1(?:36$|84$)|2\
(?:32$|8(?:$|0$))|3(?:28$|76$)|4(?:24$|72$)|5(?:20$|68$)|6(?:16$|64$)|7(?:12$|6(\
?:$|0$))|8(?:08$|56$)|9(?:04$|52$))|6(?:0(?:00$|48$|96$)|1(?:44$|92$)|2(?:4(?:$|\
0$)|88$)|3(?:36$|84$)|4(?:32$|80$)|5(?:28$|76$)|6(?:24$|72$)|7(?:2(?:$|0$)|68$)|\
8(?:16$|64$)|9(?:12$|60$))|7(?:0(?:08$|56$)|1(?:04$|52$)|2(?:0(?:$|0$)|48$|96$)|\
3(?:44$|92$)|4(?:40$|88$)|5(?:36$|84$)|6(?:32$|8(?:$|0$))|7(?:28$|76$)|8(?:24$|7\
2$)|9(?:20$|68$))|8(?:0(?:16$|64$)|1(?:12$|6(?:$|0$))|2(?:08$|56$)|3(?:04$|52$)|\
4(?:00$|48$|96$)|5(?:44$|92$)|6(?:4(?:$|0$)|88$)|7(?:36$|84$)|8(?:32$|80$)|9(?:2\
8$|76$))|9(?:0(?:24$|72$)|1(?:2(?:$|0$)|68$)|2(?:16$|64$)|3(?:12$|60$)|4(?:08$|5\
6$)|5(?:04$|52$)|6(?:$|0$))