I'm trying to implement an algorithm for generating graphs following Barabási-Albert (BA) model. Under this model, the degree distribution follows a power-law:
P(k) ~ k^-λ
Where the exponent λ should equal 3.
For simplicity, I will focus here on the R code, where I'm using igraph functions. However I get networks with λ != 3. It seems that this has been a topic extensively covered (example question 1, eq2, eq3), but I haven't been able to find a satisfactory solution.
In R I use igraph:::sample_pa function to generate a graph following the BA model. In the reproducible example below, I set
# Initialize
set.seed(1234)
order = 100
v_degrees = vector()
for (i in 1:10000) {
g <- sample_pa(order, power=3, m=8)
# Get degree distribution
d = degree(g, mode="all")
dd = degree_distribution(g, mode="all", cumulative=FALSE)
d = 1:max(d)
probability = dd[-1]
nonzero.position = which(probability !=0)
probability = probability[nonzero.position]
d = d[nonzero.position]
# Fit power law distribution and get gamma exponent
reg = lm (log(probability) ~ log(d))
cozf = coef(reg)
power.law.fit = function(x) exp(cozf[[1]] + cozf[[2]] * log(x))
gamma = -cozf[[2]]
v_degrees[i] = gamma
}
The graph seems to be scale free in fact, giving gamma=0.72±0.21 with order 100 and gamma=0.68±0.24 for order 10,000, and similar results varying the parameter m. But the exponent is clearly different from the expected gamma=3.
In fact I was trying to implement this model on a different language (C++, see code below), but I get similar results with exponents lower than 3. So I wonder if this is a common misunderstanding on the BA model or there's something wrong in the previous calculations fitting the power law distribution, of it contrarily to what is commonly expected this is the normal behavior of the BA model.
In case someone is interested or is more familiarized with C++, see appendix below.
Appendix: C++ code
For understanding the code below, assume an object class Graph, and a connect function that created an edge between two vertices passed as argument. Below I give code of two relevant functions BA_step and build_BA.
BA_step
void Graph::BA_step (int ID, int m, std::vector<double>& freqs) {
std::vector<int> connect_history;
vertices.push_back(ID);
// Connect node ID to a random node i with pi ~ ki / sum kj
while (connect_history.size() < m) {
double U (sample_prob()); // gets a value in the range [0,1)
int index (freqs[freqs.size()-1]);
for (int i(0); i<freqs.size(); ++i) {
if (U<=freqs[i]/index && !is_in(connect_history, i)) { // is_in checks if i exists in connect_history
connect(ID, i);
connect_history.push_back(i);
break;
}
}
}
// Update vector of absolute edge frequencies
for (int i(0); i<connect_history.size(); ++i) {
int index (connect_history[i]);
for (int j(index); j<freqs.size(); ++j) {
++freqs[j];
}
}
freqs.push_back(m+freqs[freqs.size()-1]);
}
build_BA
void Graph::build_BA (int m0, int m) {
// Initialization
std::vector<double> cum_nedges;
std::vector<int> connect_history;
for (int ID(0); ID<m0; ++ID) {
vertices.push_back(ID);
}
// Initial BA step
vertices.push_back(m0);
for (int i(0); i<m; ++i) {
connect(m0, i);
connect_history.push_back(i);
}
cum_nedges.push_back(1);
for (int i(1); i<m; ++i) cum_nedges.push_back(cum_nedges[cum_nedges.size()-1]+1);
cum_nedges.push_back(m+m);
// BA model
for (int ID(m0+1); ID<order; ++ID) {
BA_step(ID, m, cum_nedges);
}
}
Two things might help:
sample_pa arguments to get exponent alpha = 3
Are really power = 1 and m = 1 (check definition in that wikipedia article against the igraph::sample_pa documentation---the power argument doesn't mean the degree of the power-law distribution).
Power laws are hard to estimate
Just running OLS/LM on the degree distribution gives you an exponent closer to 0 than 3 (underestimated, in other words). Instead, if you use the igraph::power_law_fit command with a high xmin, you'll get answers closer to 3. Check Aaron Clauset's page and publications for more info on estimating power laws. Really you need to estimate an optimal x-min for every degree distribution.
Here's some code that'll work a bit better:
library(igraph)
set.seed(1234)
order = 10000
v_degrees = vector()
for (i in 1:100) {
g <- sample_pa(order, power = 1, m = 1)
d <- degree(g, mode="all")
v_degrees[i] <- fit_power_law(d, ceiling(mean(d))+100) %>% .$alpha
}
v_degrees %>% summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.646 2.806 2.864 2.873 2.939 3.120
Note that I make up the x-min to use (ceiling(mean(d))+100). Changing that will change your answers.
Related
I asked a question about a network which I've been building last week, and I iterated on the suggestions which lead me to finding a few problems. I've come back to this project and fixed up all the issues and learnt a lot more about CNNs in the process. Now I'm stuck on an issue were all of my weights move to massively negative values, which coupled with the RELU ends in the output image always being completely black (making it impossible for the classifier to do it's job).
On two labeled images:
These are passed into a two layer network, one classifier (which gets 100% on its own) and a one filter 3*3 convolutional layer.
On the first iteration the output from the conv layer looks like (images in same order as above):
The filter is 3*3*3, due to the images being RGB. The weights are all random numbers between 0.0f-1.0f. On the next iteration the images are completely black, printing the filters shows that they are now in range of -49678.5f (the highest I can see) and -61932.3f.
This issue in turn is due to the gradients being passed back from the Logistic Regression/Linear layer being crazy high for the cross (label 0, prediction 0). For the circle (label 1, prediction 0) the values are between roughly -12 and -5, but for the cross they are all in the positive high 1000 to high 2000 range.
The code which sends these back looks something like (some parts omitted):
void LinearClassifier::Train(float * x,float output, float y)
{
float h = output - y;
float average = 0.0f;
for (int i =1; i < m_NumberOfWeights; ++i)
{
float error = h*x[i-1];
m_pGradients[i-1] = error;
average += error;
}
average /= static_cast<float>(m_NumberOfWeights-1);
for (int theta = 1; theta < m_NumberOfWeights; ++theta)
{
m_pWeights[theta] = m_pWeights[theta] - learningRate*m_pGradients[theta-1];
}
// Bias
m_pWeights[0] -= learningRate*average;
}
This is passed back to the single convolution layer:
// This code is in three nested for loops (for layer,for outWidth, for outHeight)
float gradient = 0.0f;
// ReLu Derivative
if ( m_pOutputBuffer[outputIndex] > 0.0f)
{
gradient = outputGradients[outputIndex];
}
for (int z = 0; z < m_InputDepth; ++z)
{
for ( int u = 0; u < m_FilterSize; ++u)
{
for ( int v = 0; v < m_FilterSize; ++v)
{
int x = outX + u - 1;
int y = outY + v - 1;
int inputIndex = x + y*m_OutputWidth + z*m_OutputWidth*m_OutputHeight;
int kernelIndex = u + v*m_FilterSize + z*m_FilterSize*m_FilterSize;
m_pGradients[inputIndex] += m_Filters[layer][kernelIndex]*gradient;
m_GradientSum[layer][kernelIndex] += input[inputIndex]*gradient;
}
}
}
This code is iterated over by passing each image in a one at a time fashion. The gradients are obviously going in the right direction but how do I stop the huge gradients from throwing the prediction function?
RELU activations are notorious for doing this. You usually have to use a low learning rate. The reasoning behind this is that when the RELU returns positive numbers it can continue to learn freely, but if a unit gets in a position where the signal coming into it is always negative it can become a "dead" neuron and never activate again.
Also initializing your weights is more delicate with RELU. It appears that you are initializing to range 0-1 which creates a huge bias. Two tips here - Use a range centered around 0, and a range that is much smaller. A normal distribution with mean 0 and std 0.02 usually works well.
I fixed it by downscaling the gradients int the CNN layer, but now I'm confused as to why this works/is needed so if anyone has any intuition as to why this works that'd be great.
I have a series of 100 integer values which I need to reduce/subsample to 77 values for the purpose of fitting into a predefined space on screen. This gives a fraction of 77/100 values-per-pixel - not very neat.
Assuming the 77 is fixed and cannot be changed, what are some typical techniques for subsampling 100 numbers down to 77. I get a sense that it will be a jagged mapping, by which I mean the first new value is the average of [0, 1] then the next value is [3], then average [4, 5] etc. But how do I approach getting the pattern for this mapping?
I am working in C++, although I'm more interested in the technique than implementation.
Thanks in advance.
Either if you downsample or you oversample, you are trying to reconstruct a signal over nonsampled points in time... so you have to make some assumptions.
The sampling theorem tells you that if you sample a signal knowing that it has no frequency components over half the sampling frequency, you can continously and completely recover the signal over the whole timing period. There's a way to reconstruct the signal using sinc() functions (this is sin(x)/x)
sinc() (indeed sin(M_PI/Sampling_period*x)/M_PI/x) is a function that has the following properties:
Its value is 1 for x == 0.0 and 0 for x == k*Sampling_period with k == 0, +-1, +-2, ...
It has no frequency component over half of the sampling_frequency derived from Sampling_period.
So if you consider the sum of the functions F_x(x) = Y[k]*sinc(x/Sampling_period - k) to be the sinc function that equals the sampling value at position k and 0 at other sampling value and sum over all k in your sample, you'll get the best continous function that has the properties of not having components on frequencies over half the sampling frequency and have the same values as your samples set.
Said this, you can resample this function at whatever position you like, getting the best way to resample your data.
This is by far, a complicated way of resampling data, (it has also the problem of not being causal, so it cannot be implemented in real time) and you have several methods used in the past to simplify the interpolation. you have to constructo all the sinc functions for each sample point and add them together. Then you have to resample the resultant function to the new sampling points and give that as a result.
Next is an example of the interpolation method just described. It accepts some input data (in_sz samples) and output interpolated data with the method described before (I supposed the extremums coincide, which makes N+1 samples equal N+1 samples, and this makes the somewhat intrincate calculations of (in_sz - 1)/(out_sz - 1) in the code (change to in_sz/out_sz if you want to make plain N samples -> M samples conversion:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
/* normalized sinc function */
double sinc(double x)
{
x *= M_PI;
if (x == 0.0) return 1.0;
return sin(x)/x;
} /* sinc */
/* interpolate a function made of in samples at point x */
double sinc_approx(double in[], size_t in_sz, double x)
{
int i;
double res = 0.0;
for (i = 0; i < in_sz; i++)
res += in[i] * sinc(x - i);
return res;
} /* sinc_approx */
/* do the actual resampling. Change (in_sz - 1)/(out_sz - 1) if you
* don't want the initial and final samples coincide, as is done here.
*/
void resample_sinc(
double in[],
size_t in_sz,
double out[],
size_t out_sz)
{
int i;
double dx = (double) (in_sz-1) / (out_sz-1);
for (i = 0; i < out_sz; i++)
out[i] = sinc_approx(in, in_sz, i*dx);
}
/* test case */
int main()
{
double in[] = {
0.0, 1.0, 0.5, 0.2, 0.1, 0.0,
};
const size_t in_sz = sizeof in / sizeof in[0];
const size_t out_sz = 5;
double out[out_sz];
int i;
for (i = 0; i < in_sz; i++)
printf("in[%d] = %.6f\n", i, in[i]);
resample_sinc(in, in_sz, out, out_sz);
for (i = 0; i < out_sz; i++)
printf("out[%.6f] = %.6f\n", (double) i * (in_sz-1)/(out_sz-1), out[i]);
return EXIT_SUCCESS;
} /* main */
There are different ways of interpolation (see wikipedia)
The linear one would be something like:
std::array<int, 77> sampling(const std::array<int, 100>& a)
{
std::array<int, 77> res;
for (int i = 0; i != 76; ++i) {
int index = i * 99 / 76;
int p = i * 99 % 76;
res[i] = ((p * a[index + 1]) + ((76 - p) * a[index])) / 76;
}
res[76] = a[99]; // done outside of loop to avoid out of bound access (0 * a[100])
return res;
}
Live example
Create 77 new pixels based on the weighted average of their positions.
As a toy example, think about the 3 pixel case which you want to subsample to 2.
Original (denote as multidimensional array original with RGB as [0, 1, 2]):
|----|----|----|
Subsample (denote as multidimensional array subsample with RGB as [0, 1, 2]):
|------|------|
Here, it is intuitive to see that the first subsample seems like 2/3 of the first original pixel and 1/3 of the next.
For the first subsample pixel, subsample[0], you make it the RGB average of the m original pixels that overlap, in this case original[0] and original[1]. But we do so in weighted fashion.
subsample[0][0] = original[0][0] * 2/3 + original[1][0] * 1/3 # for red
subsample[0][1] = original[0][1] * 2/3 + original[1][1] * 1/3 # for green
subsample[0][2] = original[0][2] * 2/3 + original[1][2] * 1/3 # for blue
In this example original[1][2] is the green component of the second original pixel.
Keep in mind for different subsampling you'll have to determine the set of original cells that contribute to the subsample, and then normalize to find the relative weights of each.
There are much more complex graphics techniques, but this one is simple and works.
Everything depends on what you wish to do with the data - how do you want to visualize it.
A very simple approach would be to render to a 100-wide image, and then smooth scale the image down to a narrower size. Whatever graphics/development framework you're using will surely support such an operation.
Say, though, that your goal might be to retain certain qualities of the data, such as minima and maxima. In such a case, for each bin, you're drawing a line of darker color up to the minimum value, and then continue with a lighter color up to the maximum. Or, you could, instead of just putting a pixel at the average value, you draw a line from the minimum to the maximum.
Finally, you might wish to render as if you had 77 values only - then the goal is to somehow transform the 100 values down to 77. This will imply some kind of an interpolation. Linear or quadratic interpolation is easy, but adds distortions to the signal. Ideally, you'd probably want to throw a sinc interpolator at the problem. A good list of them can be found here. For theoretical background, look here.
Given a set of points P I need to find a line L that best approximates these points. I have tried to use the function gsl_fit_linear from the GNU scientific library. However my data set often contains points that have a line of best fit with undefined slope (x=c), thus gsl_fit_linear returns NaN. It is my understanding that it is best to use total least squares for this sort of thing because it is fast, robust and it gives the equation in terms of r and theta (so x=c can still be represented). I can't seem to find any C/C++ code out there currently for this problem. Does anyone know of a library or something that I can use? I've read a few research papers on this but the topic is still a little fizzy so I don't feel confident implementing my own.
Update:
I made a first attempt at programming my own with armadillo using the given code on this wikipedia page. Alas I have so far been unsuccessful.
This is what I have so far:
void pointsToLine(vector<Point> P)
{
Row<double> x(P.size());
Row<double> y(P.size());
for (int i = 0; i < P.size(); i++)
{
x << P[i].x;
y << P[i].y;
}
int m = P.size();
int n = x.n_cols;
mat Z = join_rows(x, y);
mat U;
vec s;
mat V;
svd(U, s, V, Z);
mat VXY = V(span(0, (n-1)), span(n, (V.n_cols-1)));
mat VYY = V(span(n, (V.n_rows-1)) , span(n, (V.n_cols-1)));
mat B = (-1*VXY) / VYY;
cout << B << endl;
}
the output from B is always 0.5504, Even when my data set changes. As well I thought that the output should be two values, so I'm definitely doing something very wrong.
Thanks!
To find the line that minimises the sum of the squares of the (orthogonal) distances from the line, you can proceed as follows:
The line is the set of points p+r*t where p and t are vectors to be found, and r varies along the line. We restrict t to be unit length. While there is another, simpler, description in two dimensions, this one works with any dimension.
The steps are
1/ compute the mean p of the points
2/ accumulate the covariance matrix C
C = Sum{ i | (q[i]-p)*(q[i]-p)' } / N
(where you have N points and ' denotes transpose)
3/ diagonalise C and take as t the eigenvector corresponding to the largest eigenvalue.
All this can be justified, starting from the (orthogonal) distance squared of a point q from a line represented as above, which is
d2(q) = q'*q - ((q-p)'*t)^2
I need to calculate the value of a high dimensional integral in C++. I have found numerous libraries capable of solving this task for fixed limit integrals,
\int_{0}^{L} \int_{0}^{L} dx dy f(x,y) .
However the integrals which I am looking at have variable limits,
\int_{0}^{L} \int_{x}^{L} dx dy f(x,y) .
To clarify what i mean, here is a naive 2D Riemann sum implementation in 2D, which returns the desired result,
int steps = 100;
double integral = 0;
double dl = L/((double) steps);
double x[2] = {0};
for(int i = 0; i < steps; i ++){
x[0] = dl*i;
for(int j = i; j < steps; j ++){
x[1] = dl*j;
double val = f(x);
integral += val*val*dl*dl;
}
}
where f is some arbitrary function and L the common upper integration limit. While this implementation works, it's slow and thus impractical for higher dimensions.
Effective algorithms for higher dimensions exist, but to my knowledge, library implementations (e.g. Cuba) take a fixed value vector as the limit argument which renders them useless for my problem.
Is there any reason for this and/or is there any trick to circumvent the problem?
Your integration order is wrong, should be dy dx.
You are integrating over the triangle
0 <= x <= y <= L
inside the square [0,L]x[0,L]. This can be simulated by integrating over the full square where the integrand f is defined as 0 outside of the triangle. In many cases, when f is defined on the full square, this can be accomplished by taking the product of f with the indicator function of the triangle as new integrand.
When integrating over a triangular region such as 0<=x<=y<=L one can take advantage of symmetry: integrate f(min(x,y),max(x,y)) over the square 0<=x,y<=L and divide the result by 2. This has an advantage over extending f by zero (the method mentioned by LutzL) in that the extended function is continuous, which improves the performance of the integration routine.
I compared these on the example of the integral of 2x+y over 0<=x<=y<=1. The true value of the integral is 2/3. Let's compare the performance; for demonstration purpose I use Matlab routine, but this is not specific to language or library used.
Extending by zero
f = #(x,y) (2*x+y).*(x<=y);
result = integral2(f, 0, 1, 0, 1);
fprintf('%.9f\n',result);
Output:
Warning: Reached the maximum number of function evaluations
(10000). The result fails the global error test.
0.666727294
Extending by symmetry
g = #(x,y) (2*min(x,y)+max(x,y));
result2 = integral2(g, 0, 1, 0, 1)/2;
fprintf('%.9f\n',result2);
Output:
0.666666776
The second result is 500 times more accurate than the first.
Unfortunately, this symmetry trick is not available for general domains; but integration over a triangle comes up often enough so it's useful to keep it in mind.
I was a bit confused by your integral definition but from your code i see it like this:
just did some testing so here is your code:
//---------------------------------------------------------------------------
double f(double *x) { return (x[0]+x[1]); }
void integral0()
{
double L=10.0;
int steps = 10000;
double integral = 0;
double dl = L/((double) steps);
double x[2] = {0};
for(int i = 0; i < steps; i ++){
x[0] = dl*i;
for(int j = i; j < steps; j ++){
x[1] = dl*j;
double val = f(x);
integral += val*val*dl*dl;
}
}
}
//---------------------------------------------------------------------------
Here is optimized code:
//---------------------------------------------------------------------------
void integral1()
{
double L=10.0;
int i0,i1,steps = 10000;
double x[2]={0.0,0.0};
double integral,val,dl=L/((double)steps);
#define f(x) (x[0]+x[1])
integral=0.0;
for(x[0]= 0.0,i0= 0;i0<steps;i0++,x[0]+=dl)
for(x[1]=x[0],i1=i0;i1<steps;i1++,x[1]+=dl)
{
val=f(x);
integral+=val*val;
}
integral*=dl*dl;
#undef f
}
//---------------------------------------------------------------------------
results:
[ 452.639 ms] integral0
[ 336.268 ms] integral1
so the increase in speed is ~ 1.3 times (on 32bit app on WOW64 AMD 3.2GHz)
for higher dimensions it will multiply
but still I think this approach is slow
The only thing to reduce complexity I can think of is algebraically simplify things
either by integration tables or by Laplace or Z transforms
but for that the f(*x) must be know ...
constant time reduction can of course be done
by the use of multi-threading
and or GPU ussage
this can give you N times speed increase
because this is all directly parallelisable
Quick method to quickly compute Fibonacci, using Matrix property
Divide_Conquer_Fib(n) {
i = h = 1;
j = k = 0;
while (n > 0) {
if (n%2 == 1) { // if n is odd
t = j*h;
j = i*h + j*k + t;
i = i*k + t;
}
t = h*h;
h = 2*k*h + t;
k = k*k + t;
n = (int) n/2;
}
return j;
}
How do i understand this code? What would your strategy be? Would you put lots of print statements to see how states of variables change?
It is important to see how various developers' minds would go about understanding this code.
I would start off by running it against a few vales of n to check that it actually appears to give the correct answers. Then I'd read up on the mathematical theory to understand how it is likely to be working, and finally use that knowledge to take it to bits…
The Wikipedia entry section on the Matrix form explains the basis for this algorithm.
Well, the proper way to look at this code is to know what it does: Fibonacci numbers are coming up as an interesting exercise frequently, plus there is quite a bit of context saying what it does: it uses a matrix property together with divide and conquer. It turns out that you can compute the vector (Fibn, Fibn-1) as a product of some matrix and (Fibn-1, Fibn-2). Let's assume two rows in the code below are just two rows of the same matrix:
(Fib[n] ) (1 1) (Fib[n-1])
( ) = ( ) * ( )
(Fib[n-1]) (1 0) (Fib[n-2])
Now, matrix multiplication of quadratic matrices is associative, i.e., if the matrix above is M you can compute Fibn as Mn times (1, 0).
The next step is to compute Mn using divide and conquer. The basic trick here is that Mn can be decomposed according to the bits of n: Instead of computing the power by n multiplication you decompose the computation into computing squares and multiplying an extra term if the value is odd.
This is the basic underlying approach. The computation of the powers is done in the other direction, however, which works - I think - because the matrix is symmetric. I don't think you can derive the algorithm from the code easily if you are unaware of the basic approach.