Equatorial Indian Ocean Moored Current Meter

There is an unusual and nice data set available on the web, with long-term current meter records from the equatorial Indian Ocean. It has gotten little attention so far; I haven't found any journal publications that use the deep measurements, though I might have missed something. The closest thing I have found is in the grey literature: http://www.clivar.org/sites/default/files/documents/Exchanges39.pdf page 5.

The data are available from NIO as described here: http://www.nio.org/index/option/com_nomenu/task/show/tid/2/sid/18/id/5

There is no description of the format; but if you download a file, you will find you can read it as text. It turns out it is not plain ASCII, but rather a common variant called "latin-1" that includes some additional characters such as the degree symbol. These are only in the header. Beyond the header is a flat ascii array. It's a very inefficient format, and some information in the header is incorrect, but we can work with it. For now, at least, we won't bother to try to parse the header.

In [1]:
%matplotlib inline

import os

import numpy as np
import matplotlib.pyplot as plt

from six.moves import urllib # six is for Python 2/3 compatibility
from io import open  # more 2/3 compatibility

from pycurrents.codas import to_day
In [2]:
urlbase = "http://www.nio.org/userfiles/file/datainfo/"
datafiles = ["EQ830014.dat", "EQ830016.dat"]  # We can add to the list later.

# We will only download a file if we don't already have it.
for fn in datafiles:
    print("looking for: ", fn)
    print("exists?", os.path.exists(fn))
    if not os.path.exists(fn):
        
        urllib.request.urlretrieve(urlbase + fn, fn)
        print("Retrieved: ", fn)
    else:
        print("Found: ", fn)
('looking for: ', 'EQ830014.dat')
('exists?', True)
('Found: ', 'EQ830014.dat')
('looking for: ', 'EQ830016.dat')
('exists?', True)
('Found: ', 'EQ830016.dat')

In [3]:
with open(datafiles[0], encoding='latin-1') as fid:
    lines = fid.readlines()
# Use asterisk to mark start of each line:    
print("*%s" % '*'.join(lines[:20]))    
*Type of current meter	RCM8
*Serial No. of current meter	12734
*Ship			ORV Sagar Kanya
*Place of Dep. Latitude	000°03.74' S
*Place of Dep. Longitude	083°04.76' E
*Number of Deployment	1
*Date & Time of Deployment	14/12/2000 06:30 (IST)
*Date & Time of Recovery	24/03/2002 07:50 (IST)
*Recording Interval		1 hour
*Bad Data			-9999.99
*Comment		Cleaned data (deleted the begining and end records). 
*Jday (2nd column)		Julian day from 1 Jan. 2000
*  sno     Lat   long(°E)  jday   yr mon day hour  depth (m)   tem (°C)   pot-T(°C) cond(mS/cm)  sal       dir     spd(cm/s)     U(cm/s)  V(cm/s)
*     1      0     83    350.00  2000 12 15  0    1093.05       5.69       5.58      34.31      34.71     296.80       6.04      -5.39       2.72
*     2      0     83    350.04  2000 12 15  1    1098.98       5.75       5.64      34.46      34.82     281.40       7.78      -7.63       1.54
*     3      0     83    350.08  2000 12 15  2    1098.98       5.69       5.58      34.31      34.71     281.70       7.49      -7.33       1.52
*     4      0     83    350.13  2000 12 15  3    1098.98       5.66       5.55      34.31      34.74     277.50       7.49      -7.43        .98
*     5      0     83    350.17  2000 12 15  4    1093.05       5.60       5.49      34.24      34.72     332.80       2.84      -1.30       2.53
*     6      0     83    350.21  2000 12 15  5    1098.98       5.69       5.58      34.31      34.71     340.50       3.42      -1.14       3.22
*     7      0     83    350.25  2000 12 15  6    1093.05       5.66       5.55      34.31      34.74     338.00       6.33      -2.37       5.87


Evidently there are 13 header lines, which we can discard. We will make a numpy data type (dtype) object to describe the columns. We will be skipping some columns we don't care about.

In [4]:
names = ['year', 'month', 'day', 'hour', 'depth',
         'temp', 'sal',  'u', 'v']
usecols = [4, 5, 6, 7, 8, 9, 12, 15, 16]

dtype = np.dtype(dict(names=names, formats=[np.float32] * len(names)))

fn = datafiles[0]
with open(fn, encoding='latin-1') as fid:
    dat = np.loadtxt(fid, dtype=dtype, 
                        skiprows=13, 
                        usecols=usecols)
    datr = dat.view(np.recarray)
        

I like to use "decimal days": time in days from the beginning of a given year. This is not the same as the confusing and typically misused "Julian day".

In [5]:
yearbase = 2000
dday = to_day(yearbase, datr.year, datr.month, datr.day, datr.hour)

Time to take a look at what we have. Let's just make a quick, dumb plot of everything.

In [6]:
fig, axs = plt.subplots(nrows=5, sharex=True, figsize=(6, 8))

varnames = ['u', 'v', 'temp', 'sal', 'depth']
for name, ax in zip(varnames, axs):
    ax.plot(dday, dat[name])
    ax.locator_params(axis='y', nbins=4)
    ax.set_ylabel(name)
ax.set_xlabel('days from start of %s' % yearbase)    
Out[6]:
<matplotlib.text.Text at 0x1092009d0>

Rats! We have some missing velocity values. Ordinarily we would modify the code above to take care of them, but for pedagogical purposes we will just proceed sequentially. Let's use NaNs to mark off the bad values. (Maybe later we will use masked arrays instead.)

In [7]:
cond = dat['u'] < -9999.0
dat['u'][cond] = np.nan
dat['v'][cond] = np.nan

Now replot:

In [8]:
fig, axs = plt.subplots(nrows=5, sharex=True, figsize=(6, 8))
varnames = ['u', 'v', 'temp', 'sal', 'depth']
for name, ax in zip(varnames, axs):
    ax.plot(dday, dat[name])
    ax.locator_params(axis='y', nbins=4)
    ax.set_ylabel(name)
ax.set_xlabel('days in %s' % yearbase)    
Out[8]:
<matplotlib.text.Text at 0x10a05a4d0>

Success!

Because of the nans in the gaps we can't use an ordinary function for the mean. Instead, we use "nan-aware" versions of the elementary statistics functions.

In [9]:
print(np.nanmean(datr.u), np.nanmean(datr.v))
print(np.nanstd(datr.u), np.nanstd(datr.v))
(-2.7190318, 0.68274707)
(11.571357, 8.3033705)

When working with time series data like this, we often want to concentrate on restricted frequency bands. For example, to filter out most of the tidal component, we can use a low-pass filter.

In [10]:
from pycurrents.num import Blackman_filter
In [11]:
u_detided, uw = Blackman_filter(datr.u, 24 )
In [12]:
fig, ax = plt.subplots()
ax.plot(dday, u_detided)
Out[12]:
[<matplotlib.lines.Line2D at 0x1094535d0>]

The high-frequency part of the record, largely tidal, is just the complete record minus the low-passed part.

In [13]:
u_highpass = datr.u - u_detided
fig, ax = plt.subplots()
ax.plot(dday, u_highpass)
Out[13]:
[<matplotlib.lines.Line2D at 0x1085d5310>]
In [13]: