==============================================================================
Notes to myself following a talk I heard on "multidimensional scaling", which
engendered some discussion among the faculty regarding the role of the
triangle inequality. File was dated 1996/9/23. -- rusin@math.niu.edu
==============================================================================
The problem: a set of points is given with the distance between all pair
of points known. How to find a distance-preserving embedding into
Euclidean space?
Remark: in general, one can only embed into Euclidean spaces of dimensions
which are very high (e.g. equal to the number of points). For example
if d(x, y) = 1 for all x<>y. Thus we may need to approximate the
embedding.
Also note that if the triangle inequality is not satisfied then no
embedding is possible; however, in that case we will see there is
an embedding into R^n with a non-positive-definite metric.
If an embedding exists it is unique up to isometry (translation,
reflection, and proper isometry (rotation) in SO(n)). We will choose the
embedding which has the center of mass at the origin and the axes
oriented in the directions of principal separation of the points.
Theorem: (Schoenberg, Householder, and Young) Given a symmetric matrix
B (of squares of distances) we can compute a matrix A of inner products, e.g.
a_ij = (-1/2)*(b_ij - c_i - c_j + d)
where c_i = (1/n) Sum(b_ij, j); d = (1/n^2) Sum(b_ij, i,j).
[Mohsen: Ernst Young ca 1900, if I heard him right]
Note that if there is an isometric embedding of the points to points
v_i in an inner-product space, then b_ij := d(v_i, v_j)^2 =
= = + -2; if the embedding has
center-of-mass at the origin, then
n c_i = Sum(b_ij, j) = n |v_i|^2 + Sum( |v_j|^2 , j) - 2 =
n |v_i|^2 + Sum( |v_j|^2 , j). Then n^2 d = 2n Sum( | v_i|^2, i), so
a_ij = (-1/2)(b_ij - c_i - c_j + d) = .
Thus A represents a quadratic form. Polarize this form: that is, find
an orthogonal symmetric U so that A = U D U^t with D real diagonal.
We may assume the eigenvalues of D are in decreasing order. Then
A = d1 u1 u1^t + d2 u2 u2^t + ... where u_i is the ith column of U.
In particular, if all d_i are positive and d_k >> d_(k+1), then
A' = d1 u1 u1^t + ... + dk uk uk^t is a good approximation of A. Taking
w_i = (d_i)^(1/2)*u_i for the columns of a matrix W (having only k
columns) we have A' = W W^t. Thus the rows of W are n vectors in
R^k whose inner products are roughly the same as those of the n
given points. Equivalently, the u_i give coordinate functions for
the n points: w_i = (x_1(P_i), ..., x_k(P_i) ) where the coordinate
functions are linear (x_i(P) = v_i * u_i)
Remark: A' is the best approximation to A in the least squares sense
among matrices of rank <= k.
Mohsen: recent luck using nonlinear functions x_i.
Example: 3 points with distances 3, 4, 5:
B = ((0,9,16),(9,0,25),(25,16,0))
A = (1/9)((25,-2,-23),(-2, 52, -50), (-23, -50, 73))
e'values: 13.0, 3.7, 0 (exact)
Note that the method tends to place points with well-defined axes (in
the directions of the greatest eigenvalues' eigenvectors, evidently
the directions most useful for separating points -- the longest
directions of the eventual cloud of points.) These axes are well-defined
to the extent the eigenvalues are really distinct.
Application: given points whose distances are defined according to
several possibly related criteria, one can label the principal axes
according to properties evident when comparing points for which
x_i(P) is very large to those where x_i(P) is very negative.