There is an unusual and nice data set available on the web, with long-term current meter records from the equatorial Indian Ocean. It has gotten little attention so far; I haven't found any journal publications that use the deep measurements, though I might have missed something. The closest thing I have found is in the grey literature: http://www.clivar.org/sites/default/files/documents/Exchanges39.pdf page 5.
The data are available from NIO as described here: http://www.nio.org/index/option/com_nomenu/task/show/tid/2/sid/18/id/5
There is no description of the format; but if you download a file, you will find you can read it as text. It turns out it is not plain ASCII, but rather a common variant called "latin-1" that includes some additional characters such as the degree symbol. These are only in the header. Beyond the header is a flat ascii array. It's a very inefficient format, and some information in the header is incorrect, but we can work with it. For now, at least, we won't bother to try to parse the header.
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
from six.moves import urllib # six is for Python 2/3 compatibility
from io import open # more 2/3 compatibility
from pycurrents.codas import to_day
urlbase = "http://www.nio.org/userfiles/file/datainfo/"
datafiles = ["EQ830014.dat", "EQ830016.dat"] # We can add to the list later.
# We will only download a file if we don't already have it.
for fn in datafiles:
print("looking for: ", fn)
print("exists?", os.path.exists(fn))
if not os.path.exists(fn):
urllib.request.urlretrieve(urlbase + fn, fn)
print("Retrieved: ", fn)
else:
print("Found: ", fn)
with open(datafiles[0], encoding='latin-1') as fid:
lines = fid.readlines()
# Use asterisk to mark start of each line:
print("*%s" % '*'.join(lines[:20]))
Evidently there are 13 header lines, which we can discard. We will make a numpy data type (dtype
) object to describe the columns. We will be skipping some columns we don't care about.
names = ['year', 'month', 'day', 'hour', 'depth',
'temp', 'sal', 'u', 'v']
usecols = [4, 5, 6, 7, 8, 9, 12, 15, 16]
dtype = np.dtype(dict(names=names, formats=[np.float32] * len(names)))
fn = datafiles[0]
with open(fn, encoding='latin-1') as fid:
dat = np.loadtxt(fid, dtype=dtype,
skiprows=13,
usecols=usecols)
datr = dat.view(np.recarray)
I like to use "decimal days": time in days from the beginning of a given year. This is not the same as the confusing and typically misused "Julian day".
yearbase = 2000
dday = to_day(yearbase, datr.year, datr.month, datr.day, datr.hour)
Time to take a look at what we have. Let's just make a quick, dumb plot of everything.
fig, axs = plt.subplots(nrows=5, sharex=True, figsize=(6, 8))
varnames = ['u', 'v', 'temp', 'sal', 'depth']
for name, ax in zip(varnames, axs):
ax.plot(dday, dat[name])
ax.locator_params(axis='y', nbins=4)
ax.set_ylabel(name)
ax.set_xlabel('days from start of %s' % yearbase)
Rats! We have some missing velocity values. Ordinarily we would modify the code above to take care of them, but for pedagogical purposes we will just proceed sequentially. Let's use NaNs to mark off the bad values. (Maybe later we will use masked arrays instead.)
cond = dat['u'] < -9999.0
dat['u'][cond] = np.nan
dat['v'][cond] = np.nan
Now replot:
fig, axs = plt.subplots(nrows=5, sharex=True, figsize=(6, 8))
varnames = ['u', 'v', 'temp', 'sal', 'depth']
for name, ax in zip(varnames, axs):
ax.plot(dday, dat[name])
ax.locator_params(axis='y', nbins=4)
ax.set_ylabel(name)
ax.set_xlabel('days in %s' % yearbase)
Success!
Because of the nans in the gaps we can't use an ordinary function for the mean. Instead, we use "nan-aware" versions of the elementary statistics functions.
print(np.nanmean(datr.u), np.nanmean(datr.v))
print(np.nanstd(datr.u), np.nanstd(datr.v))
When working with time series data like this, we often want to concentrate on restricted frequency bands. For example, to filter out most of the tidal component, we can use a low-pass filter.
from pycurrents.num import Blackman_filter
u_detided, uw = Blackman_filter(datr.u, 24 )
fig, ax = plt.subplots()
ax.plot(dday, u_detided)
The high-frequency part of the record, largely tidal, is just the complete record minus the low-passed part.
u_highpass = datr.u - u_detided
fig, ax = plt.subplots()
ax.plot(dday, u_highpass)