Empirical Orthogonal Functions

We will use a simple example to illustrate how EOFs are related to the data from which they are calculated, and to show two ways of calculating them. The second method, using the Singular Value Decomposition, is usually preferred in practice.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

Make a fake dataset with 4 spatial points. Note that we are arranging our array with time as the first dimension, increasing with row number, so each column is a time series. We could use any number of spatial locations (number of columns), and they don't have to be arranged any particular way. They could be all points in a rectangular lon-lat grid, for example, strung out out into a single list of points by flattening the 2-D array. To begin with, it is easiest to see how the calculation works by keeping the number of spatial points very small, hence our choice of 4 points; but in general there can even be more spatial points than temporal samples, and in fact there is nothing special about "space" and "time", or the choice of which one is the first dimension and which is the second.

In [2]:
nx = 4
nt = 100
t = np.linspace(0, 1, nt)
dat = np.zeros((nt, nx), dtype=float)
y1 = np.sin(t * (2 * np.pi / 0.11))
y2 = np.sin(t * (2 * np.pi / 0.31))
dat[:,0] = y1
dat[:,1] = 0.5 * y1 + 0.6 * y2
dat[:,2] = 0.25 * y1 + 0.8 * y2
dat[:,3] = y2
# Save a copy of the pure signal.
signaldat = dat.copy()

#add noise
noisefac = 0.15
np.random.seed(0) # make the "random" numbers repeatable
dat += noisefac * np.random.randn(nt, nx)

fig, ax = plt.subplots()
ax.plot(t, dat) # plot the 4 time series (columns)
ax.set_title("Fake time series from 4 locations")
ax.legend(["0", "1", "2", "3"])
Out[2]:
<matplotlib.legend.Legend at 0x118b2e990>

Note that we have designed our fake data so that there should be correlation through y1 among the first three columns, and through y2 among the last three. Although there are 4 columns, there are really only 2 signals--all 4 columns are, apart from the noise, just linear combinations of two functions. This is hardly obvious from the plot of the columns, though; it all looks rather messy.

EOFs calculated using the eigenvalues and eigenvectors of the covariance matrix

Calculate the eigenvalues and eigenvectors from the covariance matrix, or from that matrix times the number of points--in other words, from the summed products rather than from the mean products. You can verify that the end result is the same, apart from a scale factor of nt in the eigenvalues.

Typically one wants to remove the time mean at each spatial point, so that the dataset is separated into a time-mean and the variation about that mean. We will do that here.

If our de-meaned data matrix is $A$, then nt times the covariance matrix is $A' A$.

In [3]:
datmean = dat.mean(axis=0) # time mean
dat_dm = dat - datmean # numpy broadcasting at work...

covdat = (dat_dm.T @ dat_dm) / nt # matrix multiplication 
# this is the same as np.cov(dat.T, bias=True)
# print(np.cov(dat.T, bias=True))
vals, vecs = np.linalg.eig(covdat)
print("Covariance matrix is:\n", covdat)
print("Eigenvalues are:\n", vals)
print("Eigenvectors are columns of:\n", vecs)
Covariance matrix is:
 [[ 0.50871278  0.24546301  0.11687411 -0.00233331]
 [ 0.24546301  0.31461159  0.2830365   0.2856888 ]
 [ 0.11687411  0.2830365   0.33663063  0.37870861]
 [-0.00233331  0.2856888   0.37870861  0.51310968]]
Eigenvalues are:
 [1.09710862 0.53671203 0.02307027 0.01617376]
Eigenvectors are columns of:
 [[-0.31656187 -0.8578794  -0.40356405 -0.03110597]
 [-0.50989967 -0.16339769  0.7123446   0.45372756]
 [-0.53468016  0.14692667  0.16987344 -0.81466109]
 [-0.5949017   0.46449617 -0.5484927   0.35984862]]

The eigenvalues will be positive, but are not necessarily in any particular order; let's sort the eigenvalues and eigenvectors in descending order. The argsort() function returns the indices that sort them in ascending order, and then we use [::-1] indexing to reverse the order.

(To be clear: in this particular example the eigenvalues happen to have come out in sorted order, but you can't count on it. That's why we are demonstrating how to sort them.)

In [4]:
isort = np.argsort(vals)[::-1]
vals = vals[isort]
vecs = vecs[:, isort] # re-arrange the columns
print("Eigenvalues are:\n", vals)
print("Eigenvectors are columns of:\n", vecs)
Eigenvalues are:
 [1.09710862 0.53671203 0.02307027 0.01617376]
Eigenvectors are columns of:
 [[-0.31656187 -0.8578794  -0.40356405 -0.03110597]
 [-0.50989967 -0.16339769  0.7123446   0.45372756]
 [-0.53468016  0.14692667  0.16987344 -0.81466109]
 [-0.5949017   0.46449617 -0.5484927   0.35984862]]

Now we can project the original de-meaned data onto the new set of spatial basis functions to get the time series corresponding to each spatial eigenvector. This projection is a rotation, so the time mean of each series after rotation is still zero, within the limits of numerical accuracy.

In [5]:
tvecs = dat_dm @ vecs  # projection on new basis
print("time means after projection:\n", tvecs.mean(axis=0))
print("are all very small.")
plt.figure()
plt.plot(t, tvecs)
plt.legend(["0", "1", "2", "3"])
plt.title("EOF temporal functions")
time means after projection:
 [ 2.66453526e-17  4.95437025e-17 -3.52495810e-17 -7.91033905e-18]
are all very small.
Out[5]:
Text(0.5, 1.0, 'EOF temporal functions')

Next, verify that the eigenvalues are the variances of the temporal expansions. (By default, var() provides the sample variance, dividing by nt, not nt - 1.)

In [6]:
print("Variance:\n", tvecs.var(axis=0))  # along time axis
print("Eigenvalues:\n", vals)
Variance:
 [1.09710862 0.53671203 0.02307027 0.01617376]
Eigenvalues:
 [1.09710862 0.53671203 0.02307027 0.01617376]

Notice that the first two eigenvalues are much larger than the other two, and that you can see the periodicity predominantly of y2 in the first one, of y1 in the second one, and nothing more than noise in the other two.

We might have hoped that the EOF decomposition would more perfectly separate y2 from y1, but it can't magically do this. The EOFs are constructed simply as the orthogonal basis that diagonalizes the covariance matrix.

Let's reconstruct the original data array by summing the EOFs and adding back the time mean.

In [7]:
dat_from_eofs = tvecs @ vecs.T + datmean
fig, ax = plt.subplots()
ax.plot(t, dat_from_eofs)
ax.set_prop_cycle(None) # Resets the color cycle.
ax.plot(t, dat, '.')
ax.set_title("Reconstructed; original (dotted)")
Out[7]:
Text(0.5, 1.0, 'Reconstructed; original (dotted)')

Notice that the round-trip, from dat_dm to tvec and back to dat_dm involved first right-multiplication of dat_dm by vecs and then right-multiplication of that by vecs.T. Equivalently, we could have first done a left-multiplication of dat_dm by vecs.T, followed by left-multiplication by vecs. In either case, the round-trip involves the product of $X X'$ or its transpose, where $X$ is vecs. Because $X$ is orthonormal, $X X'$ is the identity matrix--hence the round-trip. Using the normal matrix notation, the data matrix, $A$, is dat_dm, with each column being a time series at a given spatial location. Then $A X$ is a matrix with columns that are linear combinations of the columns of dat_dm; each row is a projection of the original row onto a rotated coordinate system in which the first axis is oriented in the direction of greatest variance, and so on.

What happens if we sum only the first two EOFs, the ones with most of the variance and with structure that looks like signal?

In [8]:
dat_from_2eofs = tvecs[:,:2] @ vecs.T[:2]
fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, 
                                    sharex=True,
                                    sharey=True,
                                    figsize=(6,10))
ax0.plot(t, dat_from_eofs - datmean)
ax0.set_title('From all EOFs')
ax1.plot(t, dat_from_2eofs)
ax1.set_title('From first 2 EOFs')
ax2.plot(t, signaldat - signaldat.mean(axis=0))
ax2.set_title('De-meaned signal (no noise)')
Out[8]:
Text(0.5, 1.0, 'De-meaned signal (no noise)')

You can see that throwing away the last 2 EOFs has reduced the noise a little bit, giving a better approximation to the pure signal. This noise reduction would be more effective if we had more spatial points, such as with basin-scale maps of SSH or SST.

EOFs using Singular Value Decomposition

The simplest way to calculate EOFs is to use the Singular Value Decomposition (SVD), which essentially does the whole calculation in one step. One has to be a little bit careful about precisely how the svd function is implemented--mainly about what it returns.

The basic idea of SVD is that any matrix can be factored as the product of three other matrices, the middle one of which is diagonal, with the other two being orthonormal. Again let our dat_dm be $A$, with dimensions (nt, nx). Then $A = U D V'$, where the columns of $U$ are the eigenvectors of $A A'$ and the columns of $V$ are the eigenvectors of $A' A$. $A A'$ and $A' A$ have the same nonzero eigenvalues, which are the square of the nonzero diagonal elements of $D$.

In this formulation, $U$ is (nt, nt), $D$ is (nt, nx), and $V$ is (nx, nx), but the number of non-zero eigenvalues is the lesser of nt and nx; let's call it ne. Then, to omit the columns corresponding to zero eigenvalues, the dimensions of $U$ can be (nt, ne), of $D$ can be (ne, ne), and of $V'$ can be (ne, nx).

Implementations of SVD can return full-sized or reduced matrices; and $D$ can be returned as a matrix, or as a 1-D array with just the diagonal values. We will use np.linalg.svd with the full_matrices=False keyword argument; consequently we will get the reduced arrays and just the 1-D array of eigenvalues. In addition, note that this returns $U$, the diagonal elements of $D$, and $V'$ (already transposed).

In [9]:
u, d, vt = np.linalg.svd(dat_dm, full_matrices=False)

print("Eigenvalues from A'A:\n", vals)
print("Eigenvalues from SVD:\n", (d ** 2) / nt)  # Note the power of 2.

vecs_svd = vt.T
print("\nEigenvectors from A'A (columns):\n", vecs)
print("Eigenvectors from SVD (columns):\n", vecs_svd)

tvecs_svd = np.dot(u, np.diag(d))

plt.figure()
plt.plot(t, tvecs_svd)
plt.legend(["0", "1", "2", "3"])
Eigenvalues from A'A:
 [1.09710862 0.53671203 0.02307027 0.01617376]
Eigenvalues from SVD:
 [1.09710862 0.53671203 0.02307027 0.01617376]

Eigenvectors from A'A (columns):
 [[-0.31656187 -0.8578794  -0.40356405 -0.03110597]
 [-0.50989967 -0.16339769  0.7123446   0.45372756]
 [-0.53468016  0.14692667  0.16987344 -0.81466109]
 [-0.5949017   0.46449617 -0.5484927   0.35984862]]
Eigenvectors from SVD (columns):
 [[-0.31656187  0.8578794  -0.40356405 -0.03110597]
 [-0.50989967  0.16339769  0.7123446   0.45372756]
 [-0.53468016 -0.14692667  0.16987344 -0.81466109]
 [-0.5949017  -0.46449617 -0.5484927   0.35984862]]
Out[9]:
<matplotlib.legend.Legend at 0x11b75ff50>

Notice that the Eigenvectors calculated using the two different methods are the same except for possible sign reversals. The product of any scalar and an eigenvector still satisfies the eigenvector equation for the same eigenvalue, so the overall sign is arbitrary. It is a convention that eigenvectors are normalized to unit length, but there is no convention for determining the sign.

In [ ]: