Detailed CODAS Guide

Note

As of April 2013, this version of the documentation is no longer maintained; however, it is consistent with the last (now retired) version of CODAS processing that used Matlab. We no longer update or fix the Matlab processing code, but we will maintain the Matlab code that reads CODAS Matlab output. Although the notes refer to both Python and Matlab processing, none of the code here will be maintained. This (now retired) documentation and code will remain available for awhile longer.

Python processing code is actively maintained and developed, and CODAS Python processing is documented here.

Using quick_adcp.py to process ADCP data


Introduction

Shipboard ADCP data processing requires several necessary steps. The reality of data (and all the things that can to wrong) make the steps more complicated than one might expect at first. Basic processing of a clean dataset is easy; any problem with the data increases the complexity of the processing.

CODAS (Common Oceanographic Data Access System) is a database designed to store ADCP data and associated information (eg. heading, time, position). The CODAS database is simply a vehicle for storage and organization of the ADCP data while various processing steps are run. “CODAS Processing” refers to the University of Hawaii collection of programs that use the CODAS database to process ADCP data.

CODAS processing steps are designed to be flexible (to cope with different data sources and problems encountered during processing) and automatable (so the basic steps can be run easily with minimal overhead per dataset). Many datasets do not need all of the flexibility available, but since some data streams sporadically fail and some improve over time, there are necessarily many options in CODAS.

Quick_adcp.py is a tool to streamline the basic processing steps and provide a uniform naming convention for the various files used in processing. It has various switches that are required and others that are used to specify the kind of data being processed. This link is a description of quick_adcp.py and contains a table with the acquisition programs and data types supported by quick_adcp.py.

This document is designed to introduce the CODAS processing steps run by quick_adcp.py, point to other tools and resources, and help the user understand how to use quick_adcp.py for their particular dataset.

NOTE: Prior to using Univ. Hawaii CODAS processing software, a computer must be set up with the appropriate software. Details of the computer setup are available here.

CODAS processing overview

Here are the basic steps in CODAS processing of a shipboard ADCP dataset. You may have other steps that need to be addressed.

  1. Stage averages on disk (one of these)

    • load pre-averaged ADCP data

      • DAS2.48 (RDI’s DOS software created “pingdata files”, usually from a NB150 ADCP).
    • stage pre-averaged ADCP data on disk

      • VmDAS (RDI’s Windows software creates “LTA” or “STA” files) Runs on BB, WH, or OS (not NB).
    • make averages from single-ping data, stage on disk

      • UHDAS (Univ Hawaii linux acquisition system, can create averages for use with CODAS processing)
      • VmDAS (RDI’s Windows software creates “ENS” or “ENX” files) These can be averaged using Matlab. ENR files can be converted to a UHDAS-like directory structure and averaged as if they are UHDAS data

Note

Transect (RDI Windows 3.1 software, usually used with BB, basically unsupported by CODAS processing)

  1. create the database, add all ancillary data to database
    • scan –look through the times of the data files to see if there are problems with the PC clock or other problems with timestamps; get the time range of the data.
    • load – load the averages into the database
    • navigation steps – clean the navigation using a smoothed ocean reference layer; load the clean navigation into the database.
  2. quality checking and calibration (repeat as necessary)
    • scale factor – if fixed transducer (NB, BB, or WH) check speed of sound; scale factor calibration may be necessary. Should not be necessary for OS.
    • heading correction – gross transducer offset, time-dependent heading correction
    • graphical editing – throw out bad bins, bad profiles, and data below the bottom
  3. accessing data
    • adcpsect and adcpsect.py (extract ocean velocities with editing flags applied)
    • getmat and variants (generate matlab files containing most variables: velocity, amplitude, correlation)
    • temperature and heading (accessing temperature and heading)
Return to TOP

Setting up a Processing Directory

ADCP processing is done in a directory that is created by running adcptree.py. The processing directory is initialized with a particular collection of subdirectories and files. Some of those files are present for all CODAS processing directories, and some are specific to particular kinds of datasets. The processing directory should be in a working area of your disk, NOT in the UH programs directory tree.

Type “adcptree.py for usage, specifically if processing averaged data (LTA, STA, pingdata). Type the following to get more usage information for single-ping processing:

adcptree.py --help

NOTE: Examples for this document were run on a linux machine, with UH programs rooted at /home/ulili/programs.

Processing Strategy

It is important to keep the processing directory relatively free of clutter. If the data are worth using, it is likely that someone will look around in the processing directory for information about how the data were processed and what anomalies were present. Preserving a relatively linear path from the start of processing to the end of processing helps the person later figure out what was done. Sometimes the best thing to do is simply delete the whole processing directory and start over.

We recommend a directory structure like this for adcp processing:

ADCP data processing directory structure

Bearing that in mind, here is one approach to processing strategy

  • make a directory for the cruise that will hold:

    • the processing directory
    • summary notes (metadata file; instrumetn configuration, dates...)
    • detailed notes (eg. comments about the data, such as gaps, biases)
    • instructions (suitable for cut-and-paste for your next attempt)
    • quality directory for exploration of the data
  • run “adcptree.py” to make your processing directory

  • write down the adcptree command you used in a file

    • name the file something related to the processing directory name
    • keep the documentation file outside the processing directory until you’re done with it. That makes it easier to redo the steps from scratch without losing the file (if you have to start over)
  • type the following to see the right prompts for the datatype you have (eg. LTA)

quick_adcp.py --help
quick_adcp.py --commands lta

If you have to delete the database (rotated too far, rotated back, made a mistake, etc), you have good notes about what you did so it won’t be that hard the next time, and the new processing directory will be nice and clean. In addition, if you keep your notes and exploration OUT of the processing directory, you don’t have to worry about it getting deleted when you delete the processing directory.

Note

In the processing directory is a file called cruise_info.txt

which has the beginning of a reasonable text file. Start with that; it is already tailored for your processing directory.

A note about the database name

The CODAS database is actually a collection of binary files whose names are composed of a prefix, a 3-digit number, the suffix “blk”. There is one database directory file (also binary), whose name has the same prefix, and ends in “dir.blk”. Here is an example:

ademo001.blk
ademo002.blk
ademo003.blk
ademo004.blk
ademo005.blk
ademodir.blk

The prefix here, ademo is called the database name. In the control files used by quick_adcp.py, you will see examples of a relative path to the database, such as

DBNAME ../adcpdb/ademo

This name is specified when quick_adcp.py is run, and is the prefix for many files. Those files are distinguishable by the directory they are in and the suffix they have.

NOTE: Anytime a database name is referred to in this document, the example will use ademo. Anyting the string ademo is used, it is referring to the database name.

Making the processing directory

Following the instructions in the demo or in the quick_adcp.py help, use adcptree.py to create a processing directory.

The processing directory is created with the following subdirectories:

  • The ping directory is the default repository of pingdata files (usually called PINGDATA.* (The data location can be overridden in quick_adcp.py)

  • The config directory holds UHDAS single-ping processing parameters. It is only relevant if you are going to re-process the data. For post-processing (edit, calibration) it is irrelevant

  • The scan directory will be used to hold a list of time ranges and time info for each data file. The full time range of the dataset is also stored here

  • The loadd directory will be used for loading the data into the database. For pingdata, that is just a program that gets run. For all other data types, a two-step process exists:

    • a set of files (suffix bin and cmd are generated)
    • those files are loaded into the database
    • a set of files (suffix gps2) are generated containing the start and end time (and position) for each averaging period (these form the gps fixes for the “navigation” steps)
  • The adcpdb directory will contain the database and the configurations used during acquisition

  • The nav directory will be used for navigation calculations, including smoothed reference layer and plots of same.

  • The edit directory will be used by gautoedit (graphical editing).

  • The cal directory will be used for calibration calculations.

    • cal/rotate (time-dependent heading correction stored here)
    • cal/watertrk (watertrack calibration)
    • cal/botmtrk (bottom track calibration)
    • cal/heading (not used by quick_adcp.py)
  • The contour directory is used to store data suitable for making contour plots (eg. 15 minute averages of 10-20m vertical bins)

  • The vector directory is used to store coarser averages suitable for vector plots (eg hourly averages of 50m vertical bins)

other directories – These are sort of fossils

  • The grid directory will be used to grid the data for plotting.
  • The quality directory contains Matlab scripts for plotting on-station and underway profile statistics; this is a good place to stage your own QC investigations.
  • The stick directory contains programs to make summary plots of some specific spectral information, treating the data as a time-series (not used by quick_adcp.py)

Return to TOP

UHDAS postprocessing

NOTE: This document does not address the setup of UHDAS processing from scratch. Under most circumstances, a UHDAS dataset brought back from sea is already processed, and all that remains is manual post-processing.

A UHDAS demo for post-processing (directory and instructions) starts here. You should be familiar with the postprocessing steps: heading correction, calibration, editing.

Return to TOP

quick_adcp.py operations

Quick_adcp.py contains quite a bit of documentation about itself.

Running “quick_adcp.py –help” shows how to get help with the usual kinds of datasets. On-line access is here

Note

Always run quick_adcp.py from the root of the processing directory

Return to TOP

First pass with quick_adcp.py

SCAN

Prior to loading the database, we scan the data files in order to determine whether there are issues with timestamps that need to be addressed. The “Scan” step performs two operations:

  • list time ranges and perhaps otherinformation about the data files
  • create a file with the time range of the data

Pingdata are scanned using the executable “scanping”. Go to the original codas processing document in , and look at the “scan” section for a complete description of the output of “scanping”.

All other data (VmDAS and UHDAS) are scanned by the processing engine (Matlab or Python). In either case a little standalone program is written and then run. The main point of this step is to get the time range.

If the database name of this example is “ademo”, the following files are written:

  • ademo.scn (contains timestamp information about the data files)
  • ademo.tr (a human-readable time range, extracted from ademo.scn)

NOTE: In CODAS, times come in two flavors.

CODAS time stamps:

  • year/month/day hour:minute:second (such as 2011/04/17 14:02:32)
  • zero-based decimal day (i.e. January 1 noon UTC is 0.5, not 1.5)

Return to TOP

LOAD

The “load” step creates the database (“load” the data into the database). For pingdata, a single executable program (“loadping”) reads the pingdata bytes and stores them in the proper locations in the database (creates the *blk files in the adcpdb/ directory). For all other data types, we have one universal load-the-database program, “ldcodas”. The “ldcodas” program reads data from the load directory and creates the database from those files. The files read by ldcodas are stored in the load directory as pairs of files (*cmd and *bin), with the cmd files containing instructions to “ldcodas” and the bin files containing the data. The *.gps2 files contain times and positions at the end of the ensembles. Those are used later.

For LTA data (the only other pre-averaged data), a program translates the LTA bytes into the *bin and *cmd and “ldcodas” loads the data into the database.

For ENX and ENS data (VmDAS single-ping data), groups of (typically) 5-minutes are assembled, edited at the single-ping level, then averaged, and written out as *bin and *cmd files. Then (part 2) ldcodas loads the database. ENX and ENS data already contain navigation and have corrected timestamps; ENS may or may not have heading.

For UHDAS data, even more configuration information is required, since the raw data do not yet have corrected timestamps or any ancillary data. Processing UHDAS data from scratch requires a good grounding in CODAS processing, which can be obtained by working through examples with LTA data or by post-processing a UHDAS dataset. Processing UHDAS data entails reading the single-ping adcp data (correcting time and adding navigation and attitude), editing the single pings, averaging the data, and writing the *bin and ``cmd``d files. Again, (part 2) ldcodas creates the database.

Technical notes about pingdata

  1. Both programs (loadping and ldcodas) that create a database need to know about how to read the data. This file is a “definition” file, containing information about data and structure definitions. For pingdata, the definition file depends on the “user buffer” used during acquisition. The various definition files are already in the adcpdb/ directory. See section 5.2 in the postscript CODAS manual for details about pingdata and user buffers, and this link for the description in the original pingdata demo. The “ldcodas” program uses one definition file, called “vmadcp.def”, and it is located in the load directory.
  2. Pingdata may have useful information such as better navigation, secondary navigation, or heading correction, embedded in a specific portion of memory called the “user buffer”. The contents of teh “user buffer” depends on the “user exit” program run during acquisition. IF ue4 was used, “ubprint” can be used to extract the improved navigation, and ashtech heading correction, if they exist.
  3. If you are processing pingdata for the first time, you are advised to consult the original pingdata demo processing documentation frequently.

Return to TOP

CHECKING THE DATABASE

After any quick_adcp.py step, you may want to check the database to see what changes you have made and to ensure that everything is working as expected. This is not a step run by quick_adcp.py. You can go to another commandline window and run this command as you work your way through the quick_adcp.py steps.

An ascii menu-driven commandline utility exists to probe the database and determine what is stored in it. There is no substitute for this important but old-fashioned program. On the command line, you must specify the database name, including the path, such as:

showdb adcpdb/ademo

The original pingdata documentation explains showdb in detail, using the original pingdata demo, in which a database was created using two pingdata files. There are various examples of showdb throughout the original pingdata demo instructions.

Note

You can use showdb to check various aspects of the database. For instance, immediately after loading, the database will contain measured velocities, depths, heading, and various configuration information, but the positions will show MAX, i.e. bad values: positions are loaded in a later step.

Return to TOP

Heading correction

Accurate heading is essential for high-quality ADCP data. An error in heading of theta degrees causes an error in the cross-track direction that scales as

error = shipspeed * sin(theta).

For a ship travelling at typical cruising speeds, i.e. 5m/s (10kts), a one degree error in heading causes a cross-track error of 10cm/s. Gyros, especially older gyros and especially in low latitudes, can wander significantly, causing completely spurious cross-track errors that manifest as “eddies” in the data.

An example of a few degrees offset is shown deep in the documentation for the gui editing utility, “gautoedit”.

This page graphically illustrates the difference an error of 2 degrees degrees makes on a dataset.

Headings can come from a variety of sources, some more accurate than others. Your access to headings depends on a variety of factors

  • the acquisition system (DAS2.48, DAS2.48+ue4, VmDAS, UHDAS)

  • the instruments available on the ship (gyro (serial or synchro),

    Ashtech, Seapath, POSMV, various optical gyros)

  • the setup of the acquisition system and data feeds (wrong messages?

    instrument failed? data recorded elsewhere?)

You need to know (or find out) the sources of heading for your dataset, what is available where (which files contain which information), and what heading source was used for processing. If there is only one heading device, your options are limited. If there are both gyro and some other heading source, you can compare them to see what the differences are in quality and behavior, and either correct the data (if acquired with gyro) to the other source, or (if processed with the other source) possibly make a statement about that instrument’s data quality.

If you have pingdata, and if ue4 was used, and if there was an ashtech on board, the likely source of heading correction data is extracted from ubprint. After loading the database, the time-dependent heading correction can be examined, (corrected if necessary), and applied to the database.

If you have VmDAS data, you may have more than one source for heading, depending on whether there is a synchro gyro input (Ocean Surveyor only). Check the N1R`, N2R, and N3R files for heading messages. If gyro was used as the primary heading device, a time-dependent heading correction might be computed and be applied to the database.

If you have UHDAS data, the strategy is to use gyro data for the initial conversion from beam to earth coordinates, and correct that with the 5-minute average of the difference between the gyro and the other heading device. In older processing, this was done in a batch mode, rotating the database after the database was created. In newer UHDAS processing, the heading correction is built into the averages (*bin and *cmd files) before they are loaded into the database, and the values used are recorded to disk (cal/rotate/ens_hcorr.ang)

Running quick_adcp.py, the “rotation” stage of preliminary processing is as follows:

Initial Rotation in quick_adcp.py

acquisition heading correction file processing applied
das2.48+ue4 cal/rotate/ademo.ang batch using “rotate”
ENX,ENS (none) batch (none)
LTA,STA (none) batch (one)
UHDAS ens_hcorr.ang at-sea embedded IN the averages

Notes:

  • A time-dependent heading correction file can be generated and applied after the first-pass processing is complete.

  • A constant angle rotation is usually necesary in the second-pass steps

  • You can use showdb to examine the correction value (ANCILLARY_2)

  • You can extract the original heading and the total correction presently in the database using lst_hdg

  • You can return the heading correction to zero by one of two methods:

    • use “rotate unrotate.tmp” where “unrotate.tmp” is a copy of

      “rotate.cnt” modified to use the key word “unrotate!”

    • (deprecated, but might be necessary) make a new angle file using

      the corrections listed by lst_hdg, with decimal day in the first column and -1*(hcorr) (i.e. reverse the sign of the heading corrections from lst_hdg), and apply that as a time-dependent angle file.

The rotation step takes place in the cal/rotate directory. If there is a time-dependent heading correction file, plot the corrections and make sure you are applying something reasonable to the database. For pingdata, the program “ashrot.m” will write out ashtech statistics and make a plot of the heading correction. Matlab leaves some programs for you to edit and run yourself (to create heading correction files) Python makes the plots to view.

Return to TOP

Calibration

Bottom track calibration uses bottom track data (if there is any) to determine the remaining transducer offset to make the ship track over ground match the track measured by the ADCP.

Watertrack calibration uses the idea that the water velocities should not look any different whether the ship is stopped or moving; turning or going straight. It uses parameters to times when a significant acceleration was detected (turn and/or speed change) and calculates what rotation and scale factor would be necessary to make sure the ocean velocity looks the same before and after the acceleration. The calculation is necessarily noisy, and depends on ship behavior. For instance there are probably no watertrack calibration points on a transit, and many on a CTD hydrography cruise or a bathymetric mapping cruise.

The parameters used to detect watertrack calibration points can be tuned, but quick_adcp.py uses particular set and runs the calibration steps during the first pass. These are diagnostics, and give the user some information about further rotation or scaling necessary. More detail about bottom track and watertrack calibrations are contained in the original pingdata demo document.

Return to TOP

Postprocessing: Determining and Applying Calibrations, and Editing

After you’ve run quick_adcp.py for the first pass, you must edit the data (to remove things like wire interference, data below the bottom, bad profiles), investigate the necessity for further rotation, and decide whether a scale factor is required. This is an iterative process.

If there is a large constant heading error or a time-dependent heading error necessary, it is much harder to edit the data because changes in speed will cause changes in the ocean velocity which may look like errors.

To converge on your final dataset,

  • Ensure that the time-dependent heading correction (if it exists) as good as possible. If you change the heading correction, rerun quick_adcp.py with --steps2rerun navsteps:calib
  • Look at bottom track and watertrack calculations, and apply any gross (larger than 1/2 degree) phase correction. If you rotate the database, rerun quick_adcp.py with --steps2rerun navsteps:calib
  • Apply any scale factor if necessary (see below). This “should” be unnecessary for ocean surveyors, but is not unexpected for fixed transducer heads, such as NB150 or WH300.
  • Go through the data with gautoedit deleting obvious additional bad data. Apply the editing by running quick_adcp.py with these options (fill in your yearbase)

quick_adcp.py --steps2rerun apply_edit:navsteps:calib  --auto

  • In the last editing pass, you should click “do not show autoedit editing” so you can see what is actually in the database, not the effect of the gautoedit defaults
  • repeat until they do not change: check editing; apply calibrations (normally there will be up to 3 passes through the editing, with the last requiring no additional flagging, and up to two applications of phase and scale factor calibration values)

More discussion follows:

heading correction:

Heading correction has two components: a time-dependent correction of gyro to Ashtech (POSMV, or Seapath), and a remaining contant offset. See the Heading Correction section for more detail.

To inspect the heading correction used during in an at-sea UHDAS processing directory, go to the cal/rotate directory and edit (and run in matlab) the file called plot_hcorrstats_all.m, then look at hcorr.ps (the output file).

If you need to fix the heading correction (eg. there are gaps where no heading correction was applied, you must remove the already-existing heading correction by “unrotating” the database. Then fix the heading correction file, and rotate using the new file. “unrotating” can be done by using the “unrotate” option in rotate.cnt, or one can rotate by the negative of the values used (i.e. rotate by -1 times the values in ens_hcorr.ang).

After the time-dependent correction is made, there may still be a constant offset. Estimates of that value are in the “phase” in watertrack and bottom track calibration files, or from “recip.m” (if there is a reciprocal track available). More details are in the original pingdata demo document for bottom track and watertrack calibrations.

Scale factor

For a fixed transducer instrument (NB, BB, or WH) a scale factor may be necessary. Check the thermistor temperature to make sure the thermistor is not broken. You may need to fix the speed of sound. It is possible that application of constant scale factor is all that is necessary. See the original pingdata demo discussion about thermistor checking.

If, after editing, the scale factor for an ocean surveyor is still greater than 1%, either there is still a problem with the data (eg. underway bias not edited out) or there is a problem with the instrument. You may need to look at additional datasets or talk to others who have used data from this instrument to try and determine whether you have a problem.

gautoedit editing

Quick_adcp.py sets up the edit directory for the graphical editing tool, gautoedit (matlab gautoedit docs here ).

This tool was designed to screen data for things like ringing, on-station wire interference, jittery navigation or velocities, and bottom interference when bottom tracking was not on.

A UHDAS at-sea processing directory already has the defaults applied. You can use showdb to see that this is true.

NOTE: “Flagging data as bad” is mostly a one-way trip: you can add flags to the collection, but if you want to remove them, things get complicated. This page discusses various scenarios.

Use gautoedit to look through the database and decide whether any of the missing data should be unflagged, or whether you just need to flag some additional data. Click on the button “do not show autoedit editing” to see what is in the database. Click on “do not show profile flags” to see the original data.

adding more profile flags

As you go through gautoedit, you can
  • delete more profiles (time range or individual profiles)
  • delete rectangles (rzap) or polygons (pzap)
  • identify the bottom (bottom)
  • change thresholds to delete more bins or profiles

When you are finished, apply the editing as

quick_adcp.py  --steps2rerun apply_edit:navsteps:calib --auto

check editing and calibrations again

The last time you run gautoedit, you should click “do not show autoedit editing” so you can see what is actually in the database. You should not see any of these signatures in the ocean velocities

  • transitions between on/off station
  • ship turn
  • bias in the direction of ship motion (usually with low PG)
  • big stripes of missing data at turns or accelerations

Your watertrack and bottom track calibrations should have phase within a few tenths of a degree of zero, with all estimates agreeeing (mean, median, watertrack and bottom track) also to within a few tenths of a degree (if there are enough points). Scale factor should be within a fraction of a percent of 1.00 (0.997-0.003) and different estimates should agree (mean, median, watertrack and bottom track)

Accessing the Database

See this page for links to ADCP data access tools.