How can one use pymc to parameterize a probabilistic graphical model?
Suppose I have a PGM with two nodes X and Y.
Lets say X->Y is the graph.
And X takes two values {0,1}, and
Y also takes two values {0,1}.
I want to use pymc to learn the parameters of the distribution and populate the
graphical model with it for running inferences.
The way I could think of is as follows:
X_p = pm.Uniform("X_p", 0, 1)
X = pm.Bernoulli("X", X_p, values=X_Vals, observed=True)
Y0_p = pm.Uniform("Y0_p", 0, 1)
Y0 = pm.Bernoulli("Y0", Y0_p, values=Y0Vals, observed=True)
Y1_p = pm.Uniform("Y1_p", 0, 1)
Y1 = pm.Bernoulli("Y1", Y1_p, values=Y1Vals, observed=True)
Here Y0Vals are values of Y corresponding to X values = 0
And Y1Vals are values of Y corresponding to X values = 1.
The plan is to draw MCMC samples from these and use the means of Y0_p and Y1_p
to populate the discrete bayesian network's probability... So the probability table
for P(X) = (X_p,1-X_p) while that of P(Y/X):
Y 0 1
X
0 Y0_p 1-Y0_p
1 Y1_p 1-Y1_p
Questions:
Is this the correct way of doing this?
Does not this get clumsy, especially if I have X having 100s of discrete values?
or if a variable has two parents X and Y with 10 discrete values each?
Is there something better I can do?
Are there any good books that detail how we can do this kind of interconnection.
Related
Between mutliple .backward() passes I'd like to set the gradients to zero. Right now I have to do this for every component seperately (here these are x and t), is there a way to do this "globally" for all affected variables? (I imagine something like z.set_all_gradients_to_zero().)
I know there is optimizer.zero_grad() if you use an optimizer, but is there also a direct way without using an optimizer?
import torch
x = torch.randn(3, requires_grad = True)
t = torch.randn(3, requires_grad = True)
y = x + t
z = y + y.flip(0)
z.backward(torch.tensor([1., 0., 0.]), retain_graph = True)
print(x.grad)
print(t.grad)
x.grad.data.zero_() # both gradients need to be set to zero
t.grad.data.zero_()
z.backward(torch.tensor([0., 1., 0.]), retain_graph = True)
print(x.grad)
print(t.grad)
You can also use nn.Module.zero_grad(). In fact, optim.zero_grad() just calls nn.Module.zero_grad() on all parameters which were passed to it.
There is no reasonable way to do it globally. You can collect your variables in a list
grad_vars = [x, t]
for var in grad_vars:
var.grad = None
or create some hacky function based on vars(). Perhaps it's also possible to inspect the computation graph and zero the gradient of all leaf nodes, but I am not familiar with the graph API. Long story short, you're expected to use the object-oriented interface of torch.nn instead of manually creating tensor variables.
I have a location database for some objects using PointField with 3 dimensions. But it turns out some users only have the z coordinate (elevation) for the objects and not x and y. PointField doesn't let me define a point with only z coordinate.
I really don't want to break up the PointField into two fields as it is very convenient for me this way. It would be great if there is a work-around that I could use.
In []: Point(x=1,y=2,z=3).z
Out[]: 3.0
In []: Point(z=3).z
Out[]:
It is very strange for a user to have an elevation value and not the Lon/Lat of his position, but as a workaround, why don't you use a fixed value for X, Y, Z if the user does not provide those?
DEFAULT_X = 0
DEFAULT_Y = 0
DEFAULT_Z = 0
def create_point(x=None, y=None, z=None):
return Point(
x=x if x else DEFAULT_X,
y=y if y else DEFAULT_Y,
z=z if z else DEFAULT_Z
)
Keep in mind that your default values should be something that cannot randomly occur in your dataset, else you may mix up your data.
I have one value that is a floating point percentage from 0-100, x, and another value that is a floating point from 0-1, y. As y gets closer to zero, it should reduce the value of x on a logarithmic curve.
So for example, say x = 28.0f and y = 0.8f. Since 0.8f isn't that far from 1.0f it should only reduce the value of x by a small amount, say bringing it down to x = 25.0f or something like that. As y gets closer to zero it should more and more drastically reduce the value of x. The only way I can think of doing this is with a logarithmic curve. I know what I want it to do, but I cannot for the life of me figure out how to implement this in C++. What would this algorithm look like in C++?
It sounds like you want this:
new_x = x * ln((e - 1) * y + 1)
I'm assuming you have the natural log function ln and the constant e. The number multiplied by x is a logarithmic function of y which is 0 when y = 0 and 1 when y = 1.
Here's the logic behind that function (this is basically a math problem, not a programming problem). You want something that looks like the ln function, rising steeply at first and then leveling off. But you want it to start at (0, 0) and then pass through (1, 1), and ln starts at (1, 0) and passes through (e, 1). That suggests that before you do the ln, you do a simple linear shift that takes 0 to 1 and 1 to e: ((e - 1) * y + 1.
We can try with the following assumption: we need a function f(y) so that f(0)=0 and f(1)=1 which follows some logarithmic curve, may be something like f(y)=Alog(B+Cy), with A, B and C constants to be determined.
f(0)=0, so B=1
f(1)=1, so A=1/log(1+C)
So now, just need to find a C value so that f(0.8) is roughly equal to 25/28. A few experiment shows that C=4 is rather close. You can find closer if you want.
So one possibility would be: f(y) = log(1.0 + 4.0*y) / log(5.0)
Given a 1-dimensional array of binary variables, for example
x = [0,1,0,0,1]
I would like to create a new variable y such that y <= max(x). In other words
y = 0 only if sum(x) = 0.
y = 1 only if sum(x) > 0.
How do I convert this into a set of linear constraints?
I know this must be possible because IBM CP Optimizer Suite can handle this automatically, but I don't have access to it.
Try something simple like y <= sum(x) which will force y to zero if all the x are zero.
Then for forcing y to 1 you have several choices. You could simply add a constraint that y >= x for every variable in x, or use a big M constraint like My >= sum(x) where M is some constant which is the maximum number of variables in x that can be simultaneously equal to 1. Adding the separate constraints might give a tighter linear relaxation, especially if there are many x variables.
I calculated the histogram(a simple 1d array) for an 3D grayscale Image.
Now I would like to calculate the gradient for the this histogram at each point. So this would actually mean I have to calculate the gradient for a 1D function at certain points. However I do not have a function. So how can I calculate it with concrete x and y values?
For the sake of simplicity could you probably explain this to me on an example histogram - for example with the following values (x is the intensity, and y the frequency of this intensity):
x1 = 1; y1 = 3
x2 = 2; y2 = 6
x3 = 3; y3 = 8
x4 = 4; y4 = 5
x5 = 5; y5 = 9
x6 = 6; y6 = 12
x7 = 7; y7 = 5
x8 = 8; y8 = 3
x9 = 9; y9 = 5
x10 = 10; y10 = 2
I know that this is also a math problem, but since I need to solve it in c++ I though you could help me here.
Thank you for your advice
Marc
I think you can calculate your gradient using the same approach used in image border detection (which is a gradient calculus). If your histogram is in a vector you can calculate an approximation of the gradient as*:
for each point in the histogram compute
gradient[x] = (hist[x+1] - hist[x])
This is a very simple way to do it, but I'm not sure if is the most accurate.
approximation because you are working with discrete data instead of continuous
Edited:
Other operators will may emphasize small differences (small gradients will became more emphasized). Roberts algorithm derives from the derivative calculus:
lim delta -> 0 = f(x + delta) - f(x) / delta
delta tends infinitely to 0 (in order to avoid 0 division) but is never zero. As in computer's memory this is impossible, the smallest we can get of delta is 1 (because 1 is the smallest distance from to points in an image (or histogram)).
Substituting
lim delta -> 0 to lim delta -> 1
we get
f(x + 1) - f(x) / 1 = f(x + 1) - f(x) => vet[x+1] - vet[x]
Two generally approaches here:
a discrete approximation to the derivative
take the real derivative of a fitted function
In the first case try:
g = (y_(i+1) - y_(i-1))/2*dx
at all the points except the ends, or one of
g_left-end = (y_(i+1) - y_i)/dx
g_right-end = (y_i - y_(i-1))/dx
where dx is the spacing between x points. (Unlike the equally correct definition Andres suggested, this one is symmetric. Whether it matters or not depends on you use case.)
In the second case, fit a spline to your data[*], and ask the spline library the derivative at the point you want.
[*] Use a library! Do not implement this yourself unless this is a learning project. I'd use ROOT because I already have it on my machine, but it is a pretty heavy package just to get a spline...
Finally, if you data is noisy, you ma want to smooth it before doing slope detection. That was you avoid chasing the noise, and only look at large scale slopes.
Take some squared paper and draw on it your histogram. Draw also vertical and horizontal axes through the 0,0 point of your histogram.
Take a straight edge and, at each point you are interested in, rotate the straight edge until it accords with your idea of what the gradient at that point is. It is most important that you do this, your definition of gradient is the one you want.
Once the straight edge is at the angle you desire draw a line at that angle.
Drop perpendiculars from any 2 points on the line you just drew. It will be easier to take the following step if the horizontal distance between the 2 points you choose is about 25% or more of the width of your histogram. From the same 2 points draw horizontal lines to intersect the vertical axis of your histogram.
Your lines now define an x-distance and a y-distance, ie the length of the horizontal/ vertical (respectively) axes marked out by their intersections with the perpendiculars/horizontal lines. The gradient you want is the y-distance divided by the x-distance.
Now, to translate this into code is very straightforward, apart from step 2. You have to define what the criteria are for determining what the gradient at any point on the histogram is. Simple choices include:
a) at each point, set down your straight edge to pass through the point and the next one to its right;
b) at each point, set down your straight edge to pass through the point and the next one to its left;
c) at each point, set down your straight edge to pass through the point to the left and the point to the right.
You may want to investigate more complex choices such as fitting a curve (such as a quadratic or higher-order polynomial) through a number of points on your histogram and using the derivative of that to represent the gradient.
Until you understand the question on paper avoid coding in C++ or anything else. Once you do understand it, coding should be trivial.