Least Squares and Lattice Parameter Refinement

Overview of the Least-Squares Method

An observation equation is a conditional equation that expresses the relationship between quantities observed through experiments (observed quantities) and unknown parameters. For example, if the unknown parameters are \( a^1, a^2, \cdots, a^m\) and the observed quantity is \(Y\), then a relationship such as$$
f(a^1, a^2, \cdots, a^m) = Y$$ constitutes an observation equation. Note that this equality holds only when a perfect observation with absolutely no error has been made. If the observation equation is linear, it can be expressed as the following matrix multiplication:$$
f(a^1, a^2, \cdots, a^m) =
\begin{pmatrix}X^1 & X^2 & \cdots & X^m \end{pmatrix}
\begin{pmatrix}a^1 \\ a^2 \\ \vdots \\ a^m \end{pmatrix}
$$where \( X^1, X^2, \cdots, X^m\) are coefficients in the observation equation. Note that the superscripts appearing on this page represent parameter indices, not exponents.

If the scatter of the observed quantity \(Y\) follows a normal distribution and the observation equation is linear, the most likely values of the unknowns can be estimated by applying the least-squares method. We explain this below with an example.
Suppose that as a result of \(n\) observations, the following observation equation coefficients (\(X\)) and observed quantities (\(Y\)) are obtained:$$
(X^1_1, X^2_1, \cdots, X^m_1, Y_1), (X^1_2, X^2_2, \cdots, X^m_2, Y_2), \cdots,(X^1_n, X^2_n, \cdots, X^m_n, Y_n)$$The goal is to use these to estimate the unknown parameters \( a^1, a^2, \cdots, a^m\). Let us first define the following notation:$$
\mathbf{a} =
\begin{pmatrix} a^1 \\ a^2 \\ \vdots \\ a^m \end{pmatrix}
, \,\,\,
\mathbf{X}=
\begin{pmatrix}
X^1_1 & X^2_1 & \ldots & X^m_1 \\
X^1_2 & X^2_2 & \ldots & X^m_2 \\
\vdots & \vdots & \ddots & \vdots \\
X^1_n & X^2_n & \ldots & X^m_n
\end{pmatrix}
, \,\,\,
\mathbf{Y}=\begin{pmatrix}Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{pmatrix}
$$Also, since not all observations necessarily have equal precision, we define the weight matrix \(\mathbf{W}\) (a diagonal matrix with weights [reciprocals of squared errors] as diagonal elements) as follows:
$$\mathbf{W}=
\begin{pmatrix}
w_1 & 0 & \ldots & 0 \\
0 & w_2 & \ldots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \ldots & w_n \\
\end{pmatrix}
$$The difference between the “quantity obtained from the observation equation” and the “actual observed quantity” at each observation can be expressed as the following column vector:$$
\begin{pmatrix}
f_1(a_1,a_2,\cdots,a_m)-Y_1\\
f_2(a_1,a_2,\cdots,a_m)-Y_2 \\
\vdots \\
f_n(a_1,a_2,\cdots,a_m) -Y_n
\end{pmatrix}
=\begin{pmatrix}
X^1_1 & X^2_1 & \ldots & X^m_1 \\
X^1_2 & X^2_2 & \ldots & X^m_2 \\
\vdots & \vdots & \ddots & \vdots \\
X^1_n & X^2_n & \ldots & X^m_n
\end{pmatrix}
\begin{pmatrix} a^1 \\ a^2 \\ \vdots \\ a^m \end{pmatrix}- \mathbf{Y}
= \mathbf{X}\mathbf{a}-\mathbf{Y}
$$Squaring each component of this column vector, multiplying by the respective weights, and summing them all gives the “weighted sum of squared residuals.” Writing this as \(\Delta\) and expressing it in matrix form:$$
\Delta=(\mathbf{a}^{tr} \, \mathbf{X}^{tr} – \mathbf{Y}^{tr})\, \mathbf{W} \, (\mathbf{X}\, \mathbf{a} -\mathbf{Y})
$$1 Here, the superscript \(tr\) denotes the transpose of a matrix.

Now, we want to find the \(\mathbf{a}\) that minimizes \(\Delta\). Let us recall a simple theorem from calculus. The fact that \(\Delta\) is at a minimum means nothing other than “the response \(\delta \Delta\) to an infinitesimal change \(\delta\mathbf{a}\) of \(\mathbf{a}\) is zero.” Expressing this mathematically:$$
\delta \Delta = \delta \mathbf{a}^{tr} \, \mathbf{X}^{tr} \, \mathbf{W}\, (\mathbf{X} \, \mathbf{a} – \mathbf{Y}) + (\mathbf{a}^{tr} \, \mathbf{X}^{tr} – \mathbf{Y}^{tr}) \, \mathbf{W} \, \mathbf{X}\, \delta \mathbf{a}
= 2\delta\mathbf{a}^{tr} \, (\mathbf{X}^{tr} \, \mathbf{W} \, \mathbf{X} \, \mathbf{a} – \mathbf{X}^{tr} \, \mathbf{W} \, \mathbf{Y}) = 0$$Rearranging this equation:$$
\mathbf{a} = (\mathbf{X}^{tr} \, \mathbf{W} \, \mathbf{X})^{-1} \, \mathbf{X}^{tr}\, \mathbf{W}\, \mathbf{Y} $$This is the equation for estimating the unknown parameters \(\mathbf{a}\). If all diagonal elements of the weight matrix \( \mathbf{W}\) can be treated as 1 (i.e., if all observation errors are equal):$$
\mathbf{a} = (\mathbf{X}^{tr} \, \mathbf{X})^{-1} \, \mathbf{X}^{tr} \, \mathbf{Y} $$


Optimization of Lattice Parameters

Suppose we have performed some diffraction experiment and obtained many values of Miller indices and d-spacings. Let us consider calculating lattice parameters from this data. The calculation of lattice parameters involves different observation equations depending on the crystal system. For example, for a triclinic system:
$$
\frac{1}{d^2}=
\begin{pmatrix}h^2 & k^2 & l^2 & hk & kl & lh \end{pmatrix}
\begin{pmatrix}{a^*}^2 \\ {b^*}^2 \\ {c^*}^2 \\ b^*c^*\cos\alpha^* \\ c^*a^*\cos\beta^* \\ a^*b^*\cos\gamma^* \end{pmatrix}
$$Here, \(a^*, b^*\), etc. are the reciprocal lattice parameters of this crystal, which are the unknown parameters, and \(h,k,l\) can be regarded as coefficients in the observation equation. Once the observation equation is known, we simply follow the procedure described above to determine the reciprocal lattice parameters. However, one somewhat troublesome issue is how to handle the error matrix.

First, for simplicity, let us assume that only the scattering angle \(\theta\) contains error (in reality, various factors are involved). Since the response of \(1/d^2\) to a change in \(\theta\) follows from the Bragg condition$$1/d^2 = 4\sin^2\theta/\lambda^2$$we have$$\partial{(1/d^2)}/\partial{\theta} = 4\sin 2\theta/\lambda^2$$This means that when \(\theta\) changes by \(\partial{\theta}\), \(1/d^2\) changes by \(\partial{\theta}\times 4\sin 2\theta/\lambda^2\). Therefore, for now, let us set the diagonal elements of the weight matrix for the observed values \(1/d^2\) to the reciprocal of the square of \(\sin 2\theta\), i.e., \(1/\sin^{2}2\theta\). That is,
$$\mathbf{W’}=
\begin{pmatrix}
1/\sin^{2}2\theta_1 & 0 & \ldots & 0 \\
0 & 1/\sin^{2}2\theta_2 & \ldots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \ldots & 1/\sin^{2}2\theta_n \\
\end{pmatrix}$$ However, \(1/\sin^{2}2\theta\) is not the reciprocal of the variance of each \(1/d^2\) itself, but rather a quantity that is some multiple of it. The true error matrix \(\mathbf{W}\) is$$
\mathbf{W}= z \, \mathbf{W’}
$$where \(\mathbf{W’}\) is multiplied by an appropriate real number \(z\).

However, even if we substitute this \(\mathbf{W’}\), which is not the true error matrix, into the formula above and compute \(\mathbf{a} = (\mathbf{X}^{tr} \, \mathbf{W’} \, \mathbf{X})^{-1} \, \mathbf{X}^{tr}\, \mathbf{W’}\, \mathbf{Y}\), we can still obtain the optimal values of \(\mathbf{a}\) without any problem. This is because the weight matrix \(\mathbf{W’}\) appears twice, and the effect of \(z\) conveniently cancels out. A nicely designed system, isn’t it?

So now we have obtained the parameters \( \mathbf{a}\). Let us reconsider the errors. Thinking about error propagation, if we had used the correct \( \mathbf{W}\), the diagonal elements of the matrix \( (\mathbf{X}^{tr} \mathbf{W} \mathbf{X})^{-1} \) should give the variance of each parameter. However, as mentioned above, since \(\mathbf{W’}\) only determines the relative proportions, we do not know the absolute values. Here, let us take a slightly different perspective and recall the definition of variance:$$
N – P = \sum^{N}_{i=0}\frac{\delta_i^2}{s_i}
$$ (where \(N\): number of data points, \(P\): number of parameters, \(\delta\): deviation of data, \(s\): variance of data). Using this relation, by computing the following quantity from the optimized parameters, we can determine the absolute values of each parameter’s variance:$$
s_i = \frac{\displaystyle\sum^{N}_{i=0}\delta_i^2w_i}{w_i (N-P)}
$$The square root of this variance gives the error. Note that these are errors for \(\mathbf{a}\) (i.e., the reciprocal lattice parameters), so to convert them back to errors in the direct lattice parameters, one needs to further propagate the errors. This can be done with patience, so we omit it here.


  1. The relation \(( \mathbf{X}\, \mathbf{a} -\mathbf{Y})^{tr} = \mathbf{a}^{tr} \, \mathbf{X}^{tr} – \mathbf{Y}^{tr}\) is used to convert the column vector \(\mathbf{X}\, \mathbf{a} -\mathbf{Y}\) into a row vector. ↩︎
contents