Loading [MathJax]/jax/output/HTML-CSS/jax.js

Harmonic analysis and the Fourier transform¶

There are two types of situation in which it is particularly useful to think of a time series of observations as a sum of sinusoids:

  • When one knows, based on physical considerations, that the big signal really is very accurately expressed as a sum of a few sinusoids of known frequency. The tides are a perfect example; astronomical forcing occurs at very well-defined and well-known frequencies, so the ocean responds at those frequencies.
  • In the opposite situation, when there is little or no perceptible periodicity or other organization, and when a record from one time period looks similar in character, but entirely different in specifics, to that from another time period. An example is the measurement of almost any variable in a turbulent flow.

In the first of these cases, one might analyze the time series by using a least-squares procedure to find out the amplitude and phase of each of the known sinusoids.

In the second case, one wants to find out how the energy is distributed among a range of frequencies. For this, the Fourier transform is tailor-made. It's worth taking some time to understand what it is and how it works. We will concentrate on the discrete transform and its inverse; they are what we use in practice for data analysis. The concepts generalize easily to continuous functions.

We will take a linear algebra point of view, and start with a trivial time series: a variable y sampled at only two times, t0 and t1. The variable can be plotted as a vector--a point--in 2-D space, with one axis for y0=y(t0) and a second for y1=y(t1). Remember, this is not a physical space, it is an information space; there are two independent pieces of information required to fully specify y. The axes are perpendicular.

This specification of y is not unique; the same vector could be expressed as the sum of components along axes rotated by some angle relative to the originals. Rotating by 45 degrees CCW makes the first axis (√2/2)[1,1] and a subsequent reflection makes the second axis (√2/2)[1,−1], where we are expressing the unit vectors of the new axes in terms of the unit vectors of the old.

Let that first axis be the top row in a matrix, M, and let the second be the bottom row. Then if y′ is y expressed in terms of the rotated axes, we have y′=My. M is orthonormal--its inverse is its transpose--so y=MTy′. Notice that y and y′ are describing the same vector, but in terms of its projection onto different sets of unit vectors. The first axis of y′ is proportional to the average of y and the second axis is proportional to the difference, y0−y1 .

Now generalize this picture. Instead of two times, let there be 3, or 4, or any finite number. Above 3 we can no longer use the visual analogy of mapping each time to a spatial dimension, but the mathematics stays the same. If we have N data points, we can always express those N pieces of information by projecting them onto N linearly independent (orthogonal) axes. Because they are orthogonal, this projection is identical to a least-squares fit of the vector y to the new set of axes.

Back to the case N=2: with 2 points in the record, if we imagine that the record is extended by repeating the same pair of points again and again, we have two possible periodicities: zero cycles, or 1 full cycle in the original record length. Using the sample spacing as the time unit, the frequencies are ω0=2π⋅0/N and ω1=2π⋅1/N radians per unit time. The matrix M then has entries Mnk=√22cos(2πnk/2) .

For N>2, should we continue this pattern using cosines? We could, yielding a cosine transform; but we can do better by noting that except for the zero frequency and the N/2 cycles per record length frequency (keeping N even henceforth for simplicity), it makes more sense to continue selecting frequencies with an integral number of cycles per record length, and to let each have a cosine component and a sine component, hence an amplitude and a phase. This yields the discrete Fourier transform.

Discrete Fourier Transform¶

The Discrete Fourier Transform and its inverse are usually implemented using complex exponentials. The version provided by numpy (https://docs.scipy.org/doc/numpy/reference/routines.fft.html) is typical. Suppose you have a series of N points, an. A typical example would be a time series, say sea-surface height, or current vectors from a mooring, sampled regularly at hourly intervals. Then Δt=1 hour, so if N=240 the length of the series would be L=10 days or 240 hours. The nth time is tn=nΔt.

The Fourier frequencies are an integer number of cycles per record length: 0,1,2...N−1 cycles per NΔt, or 0,1/(NΔt),2/(NΔt),...(N−1)/(NΔt) cycles per unit time (where Δt is the unit time). Define frequency k as fk=k/(NΔt) cycles per unit time. Then the DFT maps the time series onto constituent frequency amplitudes and phases, Ak like this:

Ak=N−1∑n=0anexp(−i2πfktn)

with the inverse relation:

an=1NN−1∑k=0Akexp(i2πfktn)

Notice the near-symmetry: the expressions differ by the sign of the argument to the exponential, and by the 1/N normalization of the inverse.

To relate these expressions to the forms in the numpy documentation link, evaluate the product in the argument: fktn=(kNΔt)(nΔt)=knN

The DFT can be viewed as the projection--the inner or dot product--of the time series vector on each of the vectors representing a complex exponential of unit amplitude at a Fourier frequency.

Variance: Parseval's theorem¶

Given the conservation of information involved in the FT and its inverse, and the orthogonality of the new set of basis vectors on which the FT projects the original data, we arrive at an important theorem. In its discrete form, it states that the sum of the squares of the signal samples equals the sum of the squared amplitudes in its FT, times a normalization factor that depends on how the FT has been defined. For the definition used here, we have Σ|ak|2=1NΣ|Ak|2

This means that the variance can be divided into independent contributions from each data point, or it can be divided into independent contributions from each Fourier frequency in the FT. The latter leads to the concept of the spectrum which we will explore in two subsequent notebooks. First, we need to get a good feel for how the DFT works and what it does.

Aliasing¶

Adding any integer multiple of 2Ï€i to the argument of the exponent in the DFT or its inverse has no effect.

exp(−i2π(k+NM)nN)=exp(−i2π(knN+nM))=exp(−i2π(knN))

Therefore Ak+MN = Ak for any integer M.

The Nyquist frequency¶

The highest frequency that can be resolved is the one with two samples per cycle: the peak of the sinusoid, and the valley. This is the Nyquist frequency, fN=1/(2Δt) cycles per unit time; the period is the reciprocal, 2Δt.

Negative frequencies, or frequencies above Nyquist¶

If the highest frequency we can resolve is the Nyquist, this leaves nearly half of the Fourier frequencies too high to be sampled properly, correct? Not exactly. Those Fourier frequencies above the Nyquist are equivalent to negative frequencies below the Nyquist:

exp(−i2π(N−k)nN)=exp(−i2π(−knN+n))=exp(−i2π(−knN))=exp(i2π(knN))

Therefore AN−k = A−k The highest Fourier frequency, for example, fN−1=f−1 which is just one cycle per record length.

What is the meaning of negative frequencies? I depends on whether the original time series is real, like sea-surface height, or complex, like horizontal current velocity, with east and north components combined into a complex number, u+iv.

If it is real, A−k = A∗k. The negative frequencies contribute no new information.

If it is complex, then the positive frequencies correspond to a vector rotating counter-clockwise, and the negative frequencies correspond to a vector rotating clockwise.

DFT Examples¶

Let's look at examples of the Fourier transform as applied to simple time series. We will set things up with some imports and matplotlib defaults:

In [1]:
%matplotlib notebook 

import numpy as np
from numpy.fft import fft, ifft, fftshift, fftfreq
import matplotlib as mpl
import matplotlib.pyplot as plt

plt.rc('lines', markersize=10, markeredgewidth=1.5)
plt.rc('figure', dpi=80, max_open_warning=False)
plt.rc('savefig', dpi=80)

Now make a function to plot the original time series and its transform:

In [2]:
def show_transform(x, y):
    """
    Plot y(x) and its transform.
    
    y can be real or complex.
    """
    x = np.asarray(x)
    y = np.asarray(y)
    fy = fftshift(fft(y))
    freqs = fftshift(fftfreq(len(x), d=(x[1] - x[0])))    
    fig, axs = plt.subplots(nrows=2, constrained_layout=True)

    for ax in axs:
        ax.margins(x=0.05, y=0.1)
        ax.grid(True)
        ax.locator_params(symmetric=True)
        # (Line above doesn't seem to be working as expected...)
            
    ax = axs[0]
    if y.dtype.kind == 'c':
        ax.plot(x, y.real, 'r+', x, y.imag, 'bx')
    else:
        ax.plot(x, y, 'k.')
    ax.set_xlabel("Time")
    
    ax = axs[1]
    ax.plot(freqs, np.abs(fy), 'ko', mfc='none', label='amp')
    ax.plot(freqs, fy.real, 'r+', label='real')
    ax.plot(freqs, fy.imag, 'bx', label='imag')

    ax.set_xlabel("Frequency, cycles per unit time")
    ax.legend(loc="best", 
                  numpoints=1,
                  fontsize='small')
    

    return fig, axs    
        

Transform of a sinusoid¶

Next, make a function to generate a time series with a single sinusoid, and to plot its time series:

In [3]:
def sinusoid_pts(npts = 16,
             cycles = 3, # cycles per record
             phase = 0, # phase shift in cycles
             dt = 1, # sample time interval
             rotation=None # 'positive' or 'negative' for complex exponential
             ):
    f = cycles / (npts * dt)
    x1 = np.arange(npts, dtype=float) * dt
    phi = 2 * np.pi * (f * x1 - phase)
    if rotation is None:
        y1 = np.cos(phi)
    elif rotation == 'positive':
        y1 = np.exp(1j * phi)
    else:
        y1 = np.exp(-1j * phi)
        
    return x1, y1

def sinusoid(*args, **kw):
    x1, y1 = sinusoid_pts(*args, **kw)
    fig, axs = show_transform(x1, y1)
    return fig, axs

Start with the default parameters, 3 cycles within a short record:

In [4]:
npts = 16
sinusoid(npts=npts, cycles=3)
Out[4]:
(<Figure size 512x384 with 2 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x115f1c8d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x115f93a50>],
       dtype=object))

What happens if the signal frequency is above the Nyquist frequency?

In [5]:
npts = 32
cycles = npts - 3
fig, axs = sinusoid(npts=npts, cycles=cycles)
dt = 0.05  #  Resolve the aliased curve
x, y = sinusoid_pts(npts=npts/dt, cycles=cycles, dt=dt)
axs[0].plot(x, y, 'r')
Out[5]:
[<matplotlib.lines.Line2D at 0x115ff90d0>]

Notice that we do indeed have a simple cosine curve with 3 cycles within the record of length 32 points. The transform has zero imaginary part, and the real part is symmetric about zero frequency, as expected. Because we chose an integral number of cycles per record length, which is one of the Fourier frequencies, there is only one positive frequency with a non-zero amplitude, and it is of course the third one above zero.

Now try giving it a phase shift of 0.25 cycle:

In [6]:
sinusoid(phase=0.25)
Out[6]:
(<Figure size 512x384 with 2 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x11609c890>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1160d4990>],
       dtype=object))

That changed the curve from a cosine into a sine. Now the transform has no non-zero real part. Its imaginary part is antisymmetric about zero frequency, and is non-zero only for 3 cycles per record length, as expected.

What happens if the time series is a cosine, but has a non-integer number of cycles per record length?

In [7]:
sinusoid(cycles=3.5)
Out[7]:
(<Figure size 512x384 with 2 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x116133610>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11616cc50>],
       dtype=object))

The largest amplitudes are at 3 and 4 cycles per record length, as expected, but now there are non-zero real and imaginary amplitudes at all frequencies. This is the phenomenon of leakage, and it is fundamental to the Fourier transform and to all data analysis techniques based on it.

In [8]:
sinusoid(cycles=3.2)
Out[8]:
(<Figure size 512x384 with 2 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x1161d37d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11620b590>],
       dtype=object))

Now the signal is closer to a Fourier frequency, and the largest amplitude is at that nearest frequency, 3 cycles per record length. The distribution of real and imaginary parts of the amplitude is more complicated than in the previous example, but still shows the expected symmetry: Hn=H∗−n.

Try complex input, a complex exponential at 3 cycles per record length rotating counter-clockwise:

In [9]:
sinusoid(rotation='positive')
Out[9]:
(<Figure size 512x384 with 2 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x11626b690>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1162a4810>],
       dtype=object))

And rotating clockwise:

In [10]:
sinusoid(rotation='negative')
Out[10]:
(<Figure size 512x384 with 2 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x116302c10>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11633bf50>],
       dtype=object))

Notice the expected phase relationship between the real and imaginary parts of the input, and the isolation of the Fourier transform to a single non-zero amplitude, at a positive frequency for counter-clockwise rotation, and at a negative frequency for clockwise rotation.

Transform of a pulse¶

We have seen that the FT of a sinusoid is highly peaked--that is, it is localized in frequency space, even though some energy is spread by leakage. In the ideal case, where the sinusoid being sampled has an integral number of cycles per record length, the FT is a single spike (or two spikes, if we treat positive and negative frequencies separately.) What happens when the signal is peaked (highly localized) in the time domain? Let's look at the FT of a pulse of variable width and location:

In [11]:
def pulse(npts = 32, uplims=None, dt=1):
    """
    Generate and display a pulse signal and its transform.
    """
    x1 = np.arange(npts, dtype=float) * dt
    y1 = np.zeros_like(x1)
    if uplims is None:
        upslice = slice(0, npts // 2)
    else:
        upslice = slice(*uplims)
    y1[upslice] = 1    
    show_transform(x1, y1)

Try a pulse at the origin--a single non-zero point:

In [12]:
pulse(uplims=[0, 1])

Shift the raised point to the right:

In [13]:
pulse(uplims=[1,2])

Notice that although the transforms of these two very similar signals look quite different, they actually differ only in phase; in both cases, the magnitude of the complex amplitude is uniform with frequency. The most extreme localization in the time domain produces the most complete uniformity in the frequency domain.

Now try a wider pulse:

In [14]:
pulse(uplims=[0,4])
In [15]:
pulse(uplims=[0, 8])
In [16]:
pulse(uplims=[0, 16])

With a wider pulse--less localized in space--the frequency range with large amplitude gets narrower.

Transform of noise¶

We noted in the beginning that the FT is particularly useful for analyzing noise-like time series, with no obvious periodicity. In such cases, we are usually interested in knowing the spectral character of the data, often described with an analogy to the visual spectrum. A white time series is one with roughly uniform variance per unit frequency; a red spectrum is weighted towards low frequencies; and a blue spectrum is weighted toward high frequencies.

In [17]:
def noise(npts = 32, color='w', dt=1, repeatable=True):
    """
    Generate and display a noise signal and its transform.
    
    Color may be 'r', 'w', or 'b'.

    To experiment with different sets of pseudo-random numbers,
    set *repeatable* to *False*.
    """
    if repeatable:
        np.random.seed(0)
    x1 = np.arange(npts, dtype=float) * dt
    y1 = np.random.randn(npts + 1)
    if color == 'w':
        y1 = y1[:npts]
    elif color == 'r':
        y1 = y1.cumsum()[:npts]
    elif color == 'b':
        y1 = np.diff(y1)
    y1 -= y1.mean()    
    show_transform(x1, y1)

For each of the following, change the repeatable kwarg to 'False' and run the cell repeatedly to see how the results change with each realization of the random process.

In [18]:
noise(color='r', repeatable=True)
In [19]:
noise(color='w', repeatable=True)
In [20]:
noise(color='b')

From the three examples above, progressing from red to white to blue, you can see the change in amplitude distributions, from emphasizing low frequencies, to uniform, to emphasizing high frequencies. Because we are working with random samples, the FT values are also random.

Transform of a trend¶

In [21]:
def trend(npts = 32, dt=1):
    """
    Generate and display a linear trend and its transform.
    """
    x1 = np.arange(npts, dtype=float) * dt
    y1 = np.arange(npts, dtype=float)
    y1 -= y1.mean()
    
    show_transform(x1, y1)
In [22]:
trend()

Inverse transform¶

Apart from floating point truncation error, the inverse FT reverses the action of the FT. We can put a step in the middle, however, to attenuate or remove the energy at some frequencies, thereby filtering the signal. We will illustrate this here using a sharp low-pass cutoff. The pulse will be the first guinea pig. The effect of the sharp cutoff will be more clear with a longer time series, hence the choice of 256 points.

In [23]:
def pulse_signal(npts = 256, uplims=None, dt=1):
    """
    Generate and return a pulse signal.
    """
    x1 = np.arange(npts, dtype=float) * dt
    y1 = np.zeros_like(x1)
    if uplims is None:
        upslice = slice(0, npts // 2)
    else:
        upslice = slice(*uplims)
    y1[upslice] = 1    
    return x1, y1
In [24]:
def reconstruct(y, cutoff=None):
    """
    Reconstruct a time series via the ifft of its fft, optionally cutting
    out frequencies at and above a threshold.

    cutoff is the lowest frequency in integer cycles per record length
    that is to be omitted in the reconstruction.
    """
    y = np.asarray(y)
    npts = len(y)
    fy = fft(y)
    if cutoff is not None:
        if cutoff > 1:
            fy[cutoff:-(cutoff-1)] = 0
        else:
            raise ValueError("cutoff must be greater than 1")
    yr = ifft(fy)
    if y.dtype.kind == 'c':
        return yr
    return yr.real       
In [25]:
def show_pulse_reconstruction(cutoffs, **kw):
    """
    cutoffs must be a list of cutoff frequencies in cycles per record length
    """
    x, y = pulse_signal(**kw)
    yrs = np.zeros((len(x), len(cutoffs)), dtype=float)
    for i, cutoff in enumerate(cutoffs): 
        yrs[:, i] = reconstruct(y, cutoff=cutoff)
    fig, ax = plt.subplots()
    line0 = ax.plot(x, y, color='k', lw=3)
    lines = ax.plot(x, yrs)
    linelist = line0 + lines
    labels = ["orig"]
    labels += ["%s" % c for c in cutoffs]
    ax.legend(linelist, labels)
    ax.grid(True)
    ax.set_xlabel('Time')
    ax.set_title('Truncated Fourier series')
    
In [26]:
show_pulse_reconstruction([3, 5, 10, 20])

Notice that the reconstruction improves as we move the cutoff out to higher frequencies, but it always oscillates about its target. Is there a better way to filter using the FFT and IFFT?

Instead of using a sharp cutoff, let's try tapering down to zero.

In [27]:
def reconstruct2(y, cutoff=None, taperfrac=0.25):
    """
    Reconstruct a time series via the ifft of its fft, optionally cutting
    out frequencies at and above a threshold.

    cutoff is about half-way down the slope of the taper.
    """
    y = np.asarray(y)
    npts = len(y)
    fy = fft(y)
    if cutoff is not None:
        if cutoff > 1:
            w = np.ones((npts,), dtype=float)
            ntaper = int(round(taperfrac * cutoff))
            ntaper = min(cutoff - 2, ntaper)
            w[cutoff + ntaper:-(cutoff + ntaper - 1)] = 0
            w[cutoff - ntaper:cutoff + ntaper + 1] = np.linspace(1, 0, num=2 * ntaper + 1)
            w[-cutoff - ntaper:-cutoff + ntaper + 1] = np.linspace(0, 1, num=2 * ntaper + 1)
        else:
            raise ValueError("cutoff must be greater than 1")
    yr = ifft(fy * w)
    if y.dtype.kind == 'c':
        return yr, w
    return yr.real, w       
In [28]:
def show_pulse_reconstruction2(cutoffs, taperfrac=0.25, **kw):
    """
    cutoffs must be a list of cutoff frequencies in cycles per record length
    """
    x, y = pulse_signal(**kw)
    yrs = np.zeros((len(x), len(cutoffs)), dtype=float)
    for i, cutoff in enumerate(cutoffs): 
        yrs[:, i] = reconstruct2(y, cutoff=cutoff, taperfrac=taperfrac)[0]
    fig, ax = plt.subplots()
    line0 = ax.plot(x, y, color='k', lw=3)
    lines = ax.plot(x, yrs)
    linelist = line0 + lines
    labels = ["orig"]
    labels += ["%s" % c for c in cutoffs]
    ax.legend(linelist, labels)
    ax.grid(True)
    ax.set_xlabel('Time')
    ax.set_title('Tapered Fourier series')
    
In [29]:
show_pulse_reconstruction2([3, 5, 10, 20])
In [30]:
show_pulse_reconstruction2([3, 5, 10, 20], taperfrac=0.75)

We used a very crude tapering, but it still helps suppress the wiggles.

Filtering: windows and convolution¶

Filtering a time series, typically to smooth out high-frequency wiggles that are considered noise so that we can see the underlying low-frequency variability, can be done with a running mean, or with a running weighted mean. The mathematical operation is convolution.(r∗s)j=Σk1k0sj−krk

If the filter weights, r, have m nonzero values, then k0=−(m−1)/2 and k1=(m−1)/2 if m is odd, as is most appropriate for filtering. If m is even, the limits are −m/2+1 and m/2, respectively. Here we are letting the window index start with a negative value; in a computer application, we would redefine the expression and indices so that they start at zero or one, depending on the language.

Notice that the convolution is done by sweeping a reversed window, r, over the signal, s. Obviously there are problems at the ends of the signal, but we will ignore them for now.

A nice window is the Blackman:

In [31]:
def blackman_weights(npts):
    """
    Return the *npts* non-zero Blackman window weights.
    """
    x = np.linspace(-1, 1, npts + 2, endpoint=True) * np.pi
    x = x[1:-1]
    w = 0.42 + 0.5 * np.cos(x) + 0.08 * np.cos(2 * x)
    return w / w.sum()

def wrapped_blackman(npts, ndata):
    """
    Return an array with *ndata* points, in which *npts*
    Blackman weights are arranged with the peak as the
    first point.  Points to the left of the peak are
    wrapped to the end of the array.
    """
    w = blackman_weights(npts)
    ww = np.zeros((ndata,), dtype=float)
    mid = (npts - 1) // 2
    ww[:mid+1] = w[mid:]
    ww[-mid:] = w[:mid]
    return ww
    

Make a quick plot of the wrapped_blackman and its transform. Notice that the wrapped_blackman is an even function, w[−k]=w[k] so its transform is real.

In [32]:
w = wrapped_blackman(9, 16)
fw = fft(w)
fig, axs = plt.subplots(nrows=2, figsize=(6, 7))
plt.subplots_adjust(hspace=0.35)
axs[0].plot(w, 'k.')
axs[0].set_title("9-point Blackman centered on 0, 16 points")
axs[0].set_xlabel("Index")
axs[1].plot(fw.real, 'r+')
axs[1].plot(fw.imag, 'bx')
axs[1].set_title("FT of Blackman")
axs[1].set_xlabel("Cycles per record length")
for ax in axs:
    ax.margins(0.05)
    ax.grid(True)

Now compare the result of filtering in the time domain to multiplication in the frequency domain.

In [33]:
x, y = pulse_signal()
nblack = 31
wblack = blackman_weights(nblack)
wblack_wrapped = wrapped_blackman(nblack, len(y))

fig, ax = plt.subplots()
ax.plot(x, y, color='k', lw=10)
ax.plot(x, np.convolve(y, wblack, mode="same"), color='r', lw=4)

ax.plot(x, ifft(fft(y) * fft(wblack_wrapped)).real, color='b', lw=1)
ax.margins(0.05)
ax.legend(['orig', 'convolve', 'FT product'], loc="upper right")
ax.set_title("Convolution in time = multiplication in frequency")
Out[33]:
Text(0.5, 1.0, 'Convolution in time = multiplication in frequency')

We have taken no care in handling end effects, and we see that the two filtering methods as implemented here differ in their end effects. We could modify the methods to improve their handling of the ends, but that is beside the point here. The main things to notice are:

  • Apart from the end effects, we have verified that conventional time-domain filtering, which is convolving a set of weights with the signal, is indeed equivalent to multiplying the transformed signal by the transform of the filter (after properly positioning it in an array of the same length as the signal), and then taking the inverse transform. This may be referred to as windowing the transform.
  • Comparing this figure to the results we got with a sharp cutoff window, we see that using a smooth window like the Blackman eliminates the ringing--the wiggles at a frequency just below the cutoff--that occurs with a sharp cutoff.

The equivalence of convolution in the time domain to multiplication in the frequency domain is the discrete convolution theorem.

In [ ]: