Understanding OpenGL Matrices - opengl

I'm starting to learn about 3D rendering and I've been making good progress. I've picked up a lot regarding matrices and the general operations that can be performed on them.
One thing I'm still not quite following is OpenGL's use of matrices. I see this (and things like it) quite a lot:
x y z n
-------
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
So my best understanding, is that it is a normalized (no magnitude) 4 dimensional, column-major matrix. Also that this matrix in particular is called the "identity matrix".
Some questions:
What is the "nth" dimension?
How and when are these applied?
My biggest confusion arises from how OpenGL makes use of this kind of data.

In most 3D graphics a point is represented by a 4-component vector (x, y, z, w), where w = 1. Usual operations applied on a point include translation, scaling, rotation, reflection, skewing and combination of these.
These transformations can be represented by a mathematical object called "matrix". A matrix applies on a vector like this:
[ a b c tx ] [ x ] [ a*x + b*y + c*z + tx*w ]
| d e f ty | | y | = | d*x + e*y + f*z + ty*w |
| g h i tz | | z | | g*x + h*y + i*z + tz*w |
[ p q r s ] [ w ] [ p*x + q*y + r*z + s*w ]
For example, scaling is represented as
[ 2 . . . ] [ x ] [ 2x ]
| . 2 . . | | y | = | 2y |
| . . 2 . | | z | | 2z |
[ . . . 1 ] [ 1 ] [ 1 ]
and translation as
[ 1 . . dx ] [ x ] [ x + dx ]
| . 1 . dy | | y | = | y + dy |
| . . 1 dz | | z | | z + dz |
[ . . . 1 ] [ 1 ] [ 1 ]
One of the reason for the 4th component is to make a translation representable by a matrix.
The advantage of using a matrix is that multiple transformations can be combined into one via matrix multiplication.
Now, if the purpose is simply to bring translation on the table, then I'd say (x, y, z, 1) instead of (x, y, z, w) and make the last row of the matrix always [0 0 0 1], as done usually for 2D graphics. In fact, the 4-component vector will be mapped back to the normal 3-vector vector via this formula:
[ x(3D) ] [ x / w ]
| y(3D) ] = | y / w |
[ z(3D) ] [ z / w ]
This is called homogeneous coordinates. Allowing this makes the perspective projection expressible with a matrix too, which can again combine with all other transformations.
For example, since objects farther away should be smaller on screen, we transform the 3D coordinates into 2D using formula
x(2D) = x(3D) / (10 * z(3D))
y(2D) = y(3D) / (10 * z(3D))
Now if we apply the projection matrix
[ 1 . . . ] [ x ] [ x ]
| . 1 . . | | y | = | y |
| . . 1 . | | z | | z |
[ . . 10 . ] [ 1 ] [ 10*z ]
then the real 3D coordinates would become
x(3D) := x/w = x/10z
y(3D) := y/w = y/10z
z(3D) := z/w = 0.1
so we just need to chop the z-coordinate out to project to 2D.

The short answer that might help you get started is that the 'nth' dimension, as you call it, does not represent any visualizable quantity. It is added as a practical tool to enable matrix multiplications that cause translation and perspective projection. An intuitive 3x3 matrix cannot do those things.
A 3d value representing a point in space always gets 1 appended as the fourth value to make this trick work. A 3d value representing a direction (i.e. a normal, if you are familiar with that term) gets 0 appended in the fourth spot.

Related

Is there a way to auto-indent code inside playground in Pharo?

I noticed, while playing with Pharo, that regardless of the way I type the code in playground, if I try to run it, but there is some problem, when the debugger opens it shows your code nicely indented.
For example, if I have this not particullary well indented program in playground:
| banditA banditB testBandits multiRun results |
banditA := [ 1.0 random <= 0.6 ifTrue: [ 1.0 ] ifFalse: [ 0.0 ] ].
banditB := [ 1.0 random > 0.6 ifTrue: [ 1.0 ] ifFalse: [ 0.0 ] ].
"Tests banditA and banditB n times each, to see which one is better."
testBandits := [ :n |
| A B |.
A := 0.0.
B := 0.0.
n timesRepeat: [ A := A + banditA value ].
n timesRepeat: [ B := B + banditB value ].
A > B
ifTrue: [ { A + B . banditA} ]
ifFalse: [ A < B
ifTrue: [ { A + B . banditB} ]
ifFalse: [ { A + B . { banditA. banditB } atRandom } ] ] ].
"Accumulate the results of size - 2 * nTests number of trials of winning
bandits, each preceded with a explore phase with 2 * nTests (half to each
bandit."
multiRun := [ :nTests :size |
| testResults sum |
sum := 0.0.
testResults := (testBandits value: nTests).
2 * nTests negated + size timesRepeat: [ sum := sum + (testResults at: 2) value ].
sum ].
"Average the returns of a thousand runs, to each even number of tests between 1 and
size = 1000."
results := (1 to: 500) collect:
[ :n | { 2*n. ((1 to: 1000) collect: [ :each | multiRun value: n value: 1000]) average}].
Transcript clear; show: 'Number of tests;Return'; cr.
results do: [ :each | Transcript show: (each at: 1); show: ';'; show: (each at: 2); cr ].
results sorted: [ :first :second | (first at: 2) > (second at: 2) ]
If I open the debugger in this code sample with Ctrl+Shift+D, the debugger shows me a nicely formated DoIt object, minus the comments:
DoIt
| banditA banditB testBandits multiRun results |
banditA := [ 1.0 random <= 0.6
ifTrue: [ 1.0 ]
ifFalse: [ 0.0 ] ].
banditB := [ 1.0 random > 0.6
ifTrue: [ 1.0 ]
ifFalse: [ 0.0 ] ].
testBandits := [ :n |
| A B |
A := 0.0.
B := 0.0.
n timesRepeat: [ A := A + banditA value ].
n timesRepeat: [ B := B + banditB value ].
A > B
ifTrue: [ {(A + B).
banditA} ]
ifFalse: [ A < B
ifTrue: [ {(A + B).
banditB} ]
ifFalse: [ {(A + B).
{banditA.
banditB} atRandom} ] ] ].
multiRun := [ :nTests :size |
| testResults sum |
sum := 0.0.
testResults := testBandits value: nTests.
2 * nTests negated + size
timesRepeat: [ sum := sum + (testResults at: 2) value ].
sum ].
results := (1 to: 500)
collect: [ :n |
{(2 * n).
((1 to: 1000) collect: [ :each | multiRun value: n value: 1000 ])
average} ].
Transcript
clear;
show: 'Number of tests;Return';
cr.
results
do: [ :each |
Transcript
show: (each at: 1);
show: ';';
show: (each at: 2);
cr ].
^ results sorted: [ :first :second | (first at: 2) > (second at: 2) ]
After some digging, I discovered there is a option to format source code inside the class browser, just by right-clicking source code, then choosing Source code > Format code:
But I couldn't find a way to do the same to source code inside playground, as the Source code > Format code option is not shown on right-click. If I try the keyboard shortcut Ctrl-T inside playground, it just erases whatever is selected. I could just copy code from playground into the browser, format it there, and then copy it back to playground, or could just open the debugger on the sample and then copy the nicely formated code into the playground, erasing the DoIt top line and the return symbol ^ from the expression in the last line, but that's not convenient. So I'd like to know if there is a proper way.
By the way, the code sample I used as a example is my attempt at simulating a instance of the Multi-armed bandit problem used in a psychological experiment described in the book Algorithms to Live By: The Computer Science of Human Decisions:
"Once you become familiar with them, it’s easy to see multi-armed
bandits just about everywhere we turn. It’s rare that we make an
isolated decision, where the outcome doesn’t provide us with any
information that we’ll use to make other decisions in the future. So
it’s natural to ask, as we did with optimal stopping, how well people
generally tend to solve these problems—a question that has been
extensively explored in the laboratory by psychologists and behavioral
economists. In general, it seems that people tend to over-explore—to
favor the new disproportionately over the best. In a simple
demonstration of this phenomenon, published in 1966, Amos Tversky and
Ward Edwards conducted experiments where people were shown a box with
two lights on it and told that each light would turn on a fixed (but
unknown) percentage of the time. They were then given 1,000
opportunities either to observe which light came on, or to place a bet
on the outcome without getting to observe it. (Unlike a more
traditional bandit problem setup, here one could not make a “pull”
that was both wager and observation at once; participants would not
learn whether their bets had paid off until the end.) This is pure
exploration vs. exploitation, pitting the gaining of information
squarely against the use of it. For the most part, people adopted a
sensible strategy of observing for a while, then placing bets on what
seemed like the best outcome—but they consistently spent a lot more
time observing than they should have. How much more time? In one
experiment, one light came on 60% of the time and the other 40% of the
time, a difference neither particularly blatant nor particularly
subtle. In that case, people chose to observe 505 times, on average,
placing bets the other 495 times. But the math says they should have
started to bet after just 38 observations—leaving 962 chances to cash
in."
I'd like to see if I could arrive at this figure of 38 as a optimal number, so I wrote this simulation. Arrived pretty close, this plot shows the results of one run:
This particular run resulted in a maximum at 34, quite close to 38. There is some variation between runs, as the curve gets a bit noisy in the top, but the maximum is consistently less than ten positions away of 38.

Select right kernel size for median blur to reduce noise

I am new to image processing. We have a requirement to get circle centers with sub pixel accuracy from an image. I have used median blurring to reduce the noise. A portion of the image is shown below. The steps I followed for getting circle boundaries is given below
Reduced the noise with medianBlur
Applied OTSU thresholding with threshold API
Identified circle boundaries with findContours method.
I get different results when used different kernel size for medianBlur. I selected medianBlur to keep edges. I tried kernel size 3, 5 and 7. Now I am confused to use the right kernel size for medianBlur.
How can I decide the right kernel size?
Is there any scientific approach to decide the right kernel size for medianBlur?
I will give you two suggestions here for how to find the centroids of these disks, you can pick one depending on the level of precision you need.
First of all, using contours is not the best method. Contours depend a lot on which pixels happen to fall within the object on thresholding, noise affects these a lot.
A better method is to find the center of mass (or rather, the first order moments) of the disks. Read Wikipedia to learn more about moments in image analysis. One nice thing about moments is that we can use pixel values as weights, increasing precision.
You can compute the moments of a binary shape from its contours, but you cannot use image intensities in this case. OpenCV has a function cv::moments that computes the moments for the whole image, but I don't know of a function that can do this for each object separately. So instead I'll be using DIPlib for these computations (I'm an author).
Regarding the filtering:
Any well-behaved linear smoothing should not affect the center of mass of the objects, as long as the objects are far enough from the image edge. Being close to the edge will cause the blur to do something different on the side of the object closest to the edge compared to the other sides, introducing a bias.
Any non-linear smoothing filter has the ability to change the center of mass. Please avoid the median filter.
So, I recommend that you use a Gaussian filter, which is the most well-behaved linear smoothing filter.
Method 1: use binary shape's moments:
First I'm going to threshold without any form of blurring.
import diplib as dip
a = dip.ImageRead('/Users/cris/Downloads/Ef8ey.png')
a = a(1) # Use green channel only, simple way to convert to gray scale
_, t = dip.Threshold(a)
b = a<t
m = dip.Label(b)
msr = dip.MeasurementTool.Measure(m, None, ['Center'])
print(msr)
This outputs
| Center |
- | ----------------------- |
| dim0 | dim1 |
| (px) | (px) |
- | ---------- | ---------- |
1 | 18.68 | 9.234 |
2 | 68.00 | 14.26 |
3 | 19.49 | 48.22 |
4 | 59.68 | 52.42 |
We can now apply a smoothing to the input image a and compute again:
a = dip.Gauss(a,2)
_, t = dip.Threshold(a)
b = a<t
m = dip.Label(b)
msr = dip.MeasurementTool.Measure(m, None, ['Center'])
print(msr)
| Center |
- | ----------------------- |
| dim0 | dim1 |
| (px) | (px) |
- | ---------- | ---------- |
1 | 18.82 | 9.177 |
2 | 67.74 | 14.27 |
3 | 19.51 | 47.95 |
4 | 59.89 | 52.39 |
You can see there's some small change in the centroids.
Method 2: use gray scale moments:
Here we use the error function to apply a pseudo-threshold to the image. What this does is set object pixels to 1 and background pixels to 0, but pixels around the edges retain some intermediate value. Some people refer to this as a "fuzzy thresholding". These two images show the normal ("hard") threshold, and the error function clip ("fuzzy threshold"):
By using this fuzzy threshold, we retain more information about the exact (sub-pixel) location of the edges, which we can use when computing the first order moments.
import diplib as dip
a = dip.ImageRead('/Users/cris/Downloads/Ef8ey.png')
a = a(1) # Use green channel only, simple way to convert to gray scale
_, t = dip.Threshold(a)
c = dip.ContrastStretch(-dip.ErfClip(a, t, 30))
m = dip.Label(a<t)
m = dip.GrowRegions(m, None, -2, 2)
msr = dip.MeasurementTool.Measure(m, c, ['Gravity'])
print(msr)
This outputs
| Gravity |
- | ----------------------- |
| dim0 | dim1 |
| (px) | (px) |
- | ---------- | ---------- |
1 | 18.75 | 9.138 |
2 | 67.89 | 14.22 |
3 | 19.50 | 48.02 |
4 | 59.79 | 52.38 |
We can now apply a smoothing to the input image a and compute again:
a = dip.Gauss(a,2)
_, t = dip.Threshold(a)
c = dip.ContrastStretch(-dip.ErfClip(a, t, 30))
m = dip.Label(a<t)
m = dip.GrowRegions(m, None, -2, 2)
msr = dip.MeasurementTool.Measure(m, c, ['Gravity'])
print(msr)
| Gravity |
- | ----------------------- |
| dim0 | dim1 |
| (px) | (px) |
- | ---------- | ---------- |
1 | 18.76 | 9.094 |
2 | 67.87 | 14.19 |
3 | 19.50 | 48.00 |
4 | 59.81 | 52.39 |
You can see the differences are smaller this time, because the measurement is more precise.
In the binary case, the differences in centroids with and without smoothing are:
array([[ 0.14768417, -0.05677508],
[-0.256 , 0.01668085],
[ 0.02071882, -0.27547569],
[ 0.2137167 , -0.03472741]])
In the gray-scale case, the differences are:
array([[ 0.01277204, -0.04444567],
[-0.02842993, -0.0276569 ],
[-0.00023144, -0.01711335],
[ 0.01776011, 0.01123299]])
If the centroid measurement is given in µm rather than px, it is because your image file contains pixel size information. The measurement function will use this to give you real-world measurements (the centroid coordinate is w.r.t. the top-left pixel). If you do not desire this, you can reset the image's pixel size:
a.SetPixelSize(1)
The two methods in C++
This is a translation to C++ of the code above, including a display step to double-check that the thresholding produced the right result:
#include "diplib.h"
#include "dipviewer.h"
#include "diplib/simple_file_io.h"
#include "diplib/linear.h" // for dip::Gauss()
#include "diplib/segmentation.h" // for dip::Threshold()
#include "diplib/regions.h" // for dip::Label()
#include "diplib/measurement.h"
#include "diplib/mapping.h" // for dip::ContrastStretch() and dip::ErfClip()
int main() {
auto a = dip::ImageRead("/Users/cris/Downloads/Ef8ey.png");
a = a[1]; // Use green channel only, simple way to convert to gray scale
dip::Gauss(a, a, {2});
dip::Image b;
double t = dip::Threshold(a, b);
b = a < t; // Or: dip::Invert(b,b);
dip::viewer::Show(a);
dip::viewer::Show(b); // Verify that the segmentation is correct
dip::viewer::Spin();
auto m = dip::Label(b);
dip::MeasurementTool measurementTool;
auto msr = measurementTool.Measure(m, {}, { "Center"});
std::cout << msr << '\n';
auto c = dip::ContrastStretch(-dip::ErfClip(a, t, 30));
dip::GrowRegions(m, {}, m, -2, 2);
msr = measurementTool.Measure(m, c, {"Gravity"});
std::cout << msr << '\n';
// Iterate through the measurement structure:
auto it = msr["Gravity"].FirstObject();
do {
std::cout << "Centroid coordinates = " << it[0] << ", " << it[1] << '\n';
} while(++it);
}

Ignore missing values when generating new variable

I want to create a new variable in Stata, that is a function of 3 different variables, X, Y and Z, like:
gen new_var = (((X)*3) + ((Y)*2) + ((Z)*4))/7
All observations have missing values for one or two of the variables.
When I run the aforementioned command, all it generates are missing values, because no observation has values for all 3 of the variables. I would like Stata to complete the function ignoring the missing variables.
I tried the following commands without success:
gen new_var= (cond(missing(X*3),., X) + cond(missing(Y*2),., Y))/7
gen new_var= (!missing(X*3+Y*2+Z*4)/7)
gen new_var= (max(X , Y, Z)/7) if missing(X , Y, Z)
The egen command does not allow complicated functions; otherwise rowtotal() could work.
EDIT:
To clarify, "ignoring missing variables" means that even if any one of the component variables is not missing, then apply the function to only that variable and produce a value for the new variable. The new variable should have missing values only when all three component variables are missing.
I am going to guess that "ignoring missing values" means "treating them as zeros". If you have some other idea, you should make it explicit.
That could be
gen new_var = (cond(missing(X), 0, 3 * X) ///
+ cond(missing(Y), 0, 2 * Y) ///
+ cond(missing(Z), 0, 4 * Z)) / 7
Let's look at your solutions and explain why they are all wrong either in general or usually.
(cond(missing(X*3),., X) + cond(missing(Y*2),., Y))/7
It is sufficient is note that if it's true that X is missing, then cond() yields missing, as then X * 3 is missing too. The same kind of remark applies to terms involving Y and Z. So you're replacing any missing values by missing values, which is no gain.
!missing(X*3+Y*2+Z*4)/7
Given the information that at least one of X Y Z is always missing, then this always evaluates to 0/7 or 0. Even if X Y Z were all non-missing, then it would evaluate to 1/7. That is a long way from the sum you want. missing() always yields 1 or 0, and its negation thus 0 or 1.
(max(X, Y, Z)/7) if missing(X , Y, Z)
The maximum of X, Y, Z will be the right answer if and only if one of the values is not missing and the other two are missing. max() ignores missings to the extent possible (even though in other contexts missings are treated as if arbitrarily large positive numbers).
If you just want to "ignore missing values" without "treating them as zeros", the following will work:
clear
set obs 10
generate X = rnormal(5, 2)
generate Y = rnormal(10, 5)
generate Z = rnormal(1, 10)
replace X = . in 2
replace Y = . in 5
replace Z = . in 9
generate new_var = (((X)*3) + ((Y)*2) + ((Z)*4)) / 7 if X != . | Y != . | Z != .
list
+---------------------------------------------+
| X Y Z new_var |
|---------------------------------------------|
1. | 3.651024 3.48609 -24.1695 -11.25039 |
2. | . 14.14995 8.232919 . |
3. | 3.689442 9.812483 1.154064 5.044221 |
4. | 2.500493 13.02909 5.25539 7.797317 |
5. | 4.19431 . 6.584174 . |
6. | 7.221717 13.92533 5.045283 9.956708 |
7. | 5.746871 14.26329 3.828253 8.725744 |
8. | 1.396223 16.2358 19.01479 16.10277 |
9. | 4.633088 13.95751 . . |
10. | 2.521546 4.490258 -3.396854 .422534 |
+---------------------------------------------+
Alternatively, you could also use the inlist() function:
generate new_var = (((X)*3) + ((Y)*2) + ((Z)*4)) / 7 if !inlist(., X, Y, Z)

Eigen3/C++: MatrixXd multiply one row with another

Using the Eigen3/C++ Library, given a MatrixXd
/ x0 ... y0 \
| x1 ... y1 |
M = | ... ... ... |
| |
\ xN ... yN /
what is the fasted method to achieve a modified version as shown below?
/ x0 * y0 ... y0 \
| x1 * y1 ... y1 |
M' = | ... ... ... |
| |
\ xN * yN ... yN /
That is, one column (the one with the x-s) is replaced by itself
multiplied with another column (that with the y-s).
do you mean how to coefficient-wise assign-multiply the first and last column vectors ? there are many ways of doing it, but the easiest/fastest might be
Eigen::MatrixXd M2 = M;
M2.leftCols<1>().array() *= M2.rightCols<1>().array();
an alternative might be constructing an uninitialized matrix with a given number of rows/cols and then block-assign like
Eigen::MatrixXd M2{ M.rows(), M.cols() };
M2.rightCols( M.cols() - 1 ) = M.rightCols( M.cols() - 1 );
M2.leftCols<1>() = M.leftCols<1>().cwiseProduct( M.rightCols<1>() );
which is faster I don't know ( but your preferred profiler does ).
for future questions, here is the official Eigen quick reference ;)

how to use voodoo camera tracker?

I have the voodoo camera tracker software which takes a video as an input and gives ouput in the following format:
# Text export
# created by voodoo camera tracker - www.digilab.uni-hannover.de
# Creation date: Mon Feb 28 18:41:56 2011
# The camera (one line per frame)
#
# Description of the CAHV camera model:
# -------------
# (Cx, Cy, Cz) : CameraPosition [mm]
# (Ax, Ay, Az) : RotationAxis2 [unit vector]
# (Hx, Hy, Hz) : RotationAxis0 [pel] (including FocalLength, PixelSizeX, and Principal Point offset)
# (Vx, Vy, Vz) : RotationAxis1 [pel] (including FocalLength, PixelSizeY, and Principal Point offset)
# (K3, K5) : Radialdistortion; K3 [1/(mm)^2] K5 [1/(mm)^4]
# (sx, sy) : PixelSize [mm/pel]
# (Width, Height) : ImageSize [pel]
# -------------
# (ppx, ppy) : Principal Point offset [pel]
# f : Focal Length [mm]
# fov : Horizontal Field of View [degree] = (2*atan(0.5*Width*sx/f)*180/PI;
# (H0x, H0y, H0z) : RotationAxis0 [unit vector]
# (V0x, V0y, V0z) : RotationAxis1 [unit vector]
# -------------
# (x, y) : image coordinates [pel]
# (X, Y, Z) : 3D coordinates [mm]
# -------------
# Projection of 3D coordinates in the camera image:
# [ x' ] = [ Hx Hy Hz ] [ 1 0 0 -Cx] [ X ]
# [ y' ] = [ Vx Vy Vz ] [ 0 1 0 -Cy] [ Y ]
# [ z' ] = [ Ax Ay Az ] [ 0 0 1 -Cz] [ Z ]
# [ 1 ]
# or
# [ x' ] = [f/sx 0 ppx] [ H0x H0y H0z ] [ 1 0 0 -Cx] [ X ]
# [ y' ] = [0 f/sy ppy] [ V0x V0y V0z ] [ 0 1 0 -Cy] [ Y ]
# [ z' ] = [0 0 1 ] [ Ax Ay Az ] [ 0 0 1 -Cz] [ Z ]
# [ 1 ]
# then x = x'/z' and y = y'/z' , if the origin of the image coordinates is in the center of the image
# or x = x'/z' + 0.5*(Width-1) and y = y'/z' + 0.5*(Height-1) , if the origin of the image coordinates is in the upper left corner
# -------------
# Cx Cy Cz Ax Ay Az Hx Hy Hz Vx Vy Vz K3 K5 sx sy Width Height ppx ppy f fov H0x H0y H0z V0x V0y V0z
Now this is a CAHV camera model which gives the values for every frame. I want to know how to extract the camera parameters like translation, rotation, zoom from this output??
Thanks in advance..
This link gives us the details how to use the tracker.
As you can see, the last step says:: Export the estimated camera parameters to the 3D animation package. So I exported into a Maya Script file, and then I used Notepad++ to open the file which gave the details regarding the camera parameters for each frame.