Skip to content

Commit

Permalink
Import Alex's ingest documentation with updates for new ingesters.
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremyh committed Jul 23, 2015
1 parent 810fa27 commit 1ac2c4a
Show file tree
Hide file tree
Showing 6 changed files with 363 additions and 7 deletions.
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,11 @@ This repository contains the code for the AGDC - see [LICENSE](LICENSE) for the
## Branches [![Build Status](https://travis-ci.org/GeoscienceAustralia/agdc.svg?branch=develop)](https://travis-ci.org/GeoscienceAustralia/agdc)

* [master] (https://github.com/GeoscienceAustralia/agdc/tree/master) represents the current **stable** version of the AGDC codebase.

* [develop] (https://github.com/GeoscienceAustralia/agdc/tree/develop) represents the current **in progress** version of the code AGDC codebase.

## Documentation

Documentation for the AGDC can be found [here] (http://geoscienceaustralia.github.io/agdc).



- [API Usage](http://geoscienceaustralia.github.io/agdc)
- [Database configuration](database/README.md)
- [Ingestion](agdc/ingest/README.md)

119 changes: 119 additions & 0 deletions agdc/ingest/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@


Dataset Ingestion
=================

Purpose
-------

The scripts in this package ingest various datasets into the
Australian Geoscience Data Cube (AGDC). They read the
metadata from the nominated dataset(s) and check this against the
metadata held in the AGDC database to detect whether each source dataset
should be ingested.

If ingestion is required, then the script prepares an intermediate
GDAL-readable band-stack file in the native dataset projection in
preparation for reprojection, resampling and tiling. This intermediate
data file provides an ideal interface to a generic ingestion process
back-end. The subsequent reprojection, resampling and tiling of the
intermediate file, along with the required cataloguing of dataset and
tile entities, then takes place.

There is no impediment to appending the ingestion to the scene
processing chain at the NCI so a scene could be ingested into the AGDC
as soon as it is created.

Background Information
----------------------

The Landsat ingester was written by Matthew Hoyles and
Matthew Hardy in the first half of 2014. The Landsat-specific portions
of the code are specialisations of generic classes, so this
implementation serves as a template for ingesters for other dataset
types.

Basic Usage
-----------

### Database

An AGDC database is required, whose setup is detailed in [the database readme](../../database/README.md).

### Config file

Sample:

# AGDC configuration file for testing Landsat 8 ingestion

[datacube]
default_tile_type_id = 1

# Database connection parameters
host = 130.56.244.226
port = 6432
dbname = datacube
user = cube_admin
password = cube_admin

# User-specific temporary directory for scratch files
temp_dir = /data/tmp

# Root directory for tiles
tile_root = /data/tiles

# Dataset filter parameters for ingestion.
start_date = 01/01/2009
end_date = 30/04/2009
min_path = 88
max_path = 115
min_row = 66
max_row = 91

# List of tile types
tile_types = [1]

### Quick example


When installed as a python package, `agdc-ingest-*` commands will be available in the shell:

$ agdc-ingest-landsat
usage: agdc-ingest-landsat [-h] [-C CONFIG_FILE] [-d] [--fastfilter]
[--synctime SYNC_TIME] [--synctype SYNC_TYPE]
--source SOURCE_DIR [--followsymlinks]
agdc-ingest-landsat: error: argument --source is required
$

Example ingestion of all Landsat scenes within a directory (using config file `local.conf`):

$ agdc-ingest-landsat -d -C local.conf --source /data/inputs/ls8/nbar/

### Ingesters

See the README files of each individual ingester for usage instructions and limitations.

- [Landsat](landsat/README.md)
- [Modis](modis/README.md)
- [WOfS](wofs/README.md)

NCI
===

A module is available for use within the NCI environment.

Ensure the agdc project is on your module path:

export MODULEPATH="/projects/el8/opt/modules/modulefiles:$MODULEPATH"

Load and run:

$ module load agdc
$ agdc-ingest-landsat ...

This will load the latest stable release of AGDC. If you wish for more version stability, it's
advised to always load with a specific version number:

$ module load agdc/1.2.0


183 changes: 183 additions & 0 deletions agdc/ingest/landsat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@

Landsat Ingestion
-----------------

Example usage
-------------

agdc-ingest-landsat --config agdc_ls8_test.conf --source \
/g/data/rs0/scenes/ARG25_V0.0/2014-05/LS8_OLI_TIRS_NBAR_P54_GANBAR01-002_115_077_20140507
/g/data/rs0/scenes/ARG25_V0.0/2014-05/LS8_OLI_TIRS_NBAR_P54_GANBAR01-002_115_078_20140507

Extra Parameters
----------------

There are additional command line options, but they are primarily used for
testing and debugging.

Usage instructions are as follows:

usage: agdc-ingest-landsat [-h] [-C CONFIG_FILE] [-d] --source SOURCE_DIR

[--followsymlinks] [--fastfilter] [--synctime SYNC_TIME]

[--synctype SYNC_TYPE]

optional arguments:

-h, --help show this help message and exit

-C CONFIG_FILE, --config CONFIG_FILE

LandsatIngester configuration file

-d, --debug Debug mode flag

--source SOURCE_DIR Source root directory containing datasets

--followsymlinks Follow symbolic links when finding datasets to ingest

--fastfilter Filter datasets using filename patterns.

--synctime SYNC_TIME Synchronize parallel ingestions at the given time in

seconds after 01/01/1970

--synctype SYNC_TYPE Type of transaction to syncronize with synctime, one

of "cataloging", "tiling", or "mosaicking".


Dataset Requirements
--------------------

The Landsat ingester has been built to follow the in-house dataset structures within GA.

Datasets are found based on folder naming conventions. The ingester
expects a 'scene01' folder within each dataset containing each band as an individual GeoTIFF.

The dataset folder name is read to perform filtering (see above), and is
expected to be in GA's dataset ID format:

<Sat>_<Sensor>_<Level>_<Level Code>_<Product_Code>-<Groundstation>_<Path>_<Row>_YYYYMMDD

However, this ingester ignores everything but the last three fields:

path, row, date: `'_\d{3}_\d{3}_\d{4}\d{2}\d{2}$'`

See the following section for specific examples.

### GA NBAR (Geoscience Australia's Surface Reflectance)

Example NBAR package layouts (LS5, 7 and 8):

.
|-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228
| |-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228.jpg
| |-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228_FR.jpg
| |-- md5sum.txt
| |-- metadata.xml
| `-- scene01
| |-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228_B10.tif
| |-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228_B20.tif
| |-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228_B30.tif
| |-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228_B40.tif
| |-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228_B50.tif
| |-- LS5_TM_NBAR_P54_GANBAR01-002_100_081_20100228_B70.tif
| `-- report.txt
|-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225
| |-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225.jpg
| |-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225_FR.jpg
| |-- md5sum.txt
| |-- metadata.xml
| `-- scene01
| |-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225_B10.tif
| |-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225_B20.tif
| |-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225_B30.tif
| |-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225_B40.tif
| |-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225_B50.tif
| |-- LS7_ETM_NBAR_P54_GANBAR01-002_100_080_20001225_B70.tif
| `-- report.txt
`-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012.jpg
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012_FR.jpg
|-- md5sum.txt
|-- metadata.xml
`-- scene01
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012_B1.tif
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012_B2.tif
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012_B3.tif
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012_B4.tif
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012_B5.tif
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012_B6.tif
|-- LS8_OLI_TIRS_NBAR_P54_GANBAR01-015_101_078_20141012_B7.tif
`-- report.txt



### GA Pixel Quality

Example LS5, 7 and 8:

.
|-- LS5_TM_PQ_P55_GAPQ01-002_112_080_20080330
| |-- md5sum.txt
| |-- metadata.xml
| `-- scene01
| |-- ACCA_CLOUD_SHADOW_LOGFILE.txt
| |-- ACCA_LOGFILE.txt
| |-- FMASK_CLOUD_SHADOW_LOGFILE.txt
| |-- FMASK_LOGFILE.txt
| `-- LS5_TM_PQ_P55_GAPQ01-002_112_080_20080330_1111111111111100.tif
|-- LS7_ETM_PQ_P55_GAPQ01-002_100_080_20000124
| |-- md5sum.txt
| |-- metadata.xml
| `-- scene01
| |-- ACCA_CLOUD_SHADOW_LOGFILE.txt
| |-- ACCA_LOGFILE.txt
| |-- FMASK_CLOUD_SHADOW_LOGFILE.txt
| |-- FMASK_LOGFILE.txt
| `-- LS7_ETM_PQ_P55_GAPQ01-002_100_080_20000124_1111111111111100.tif
`-- LS8_OLI_TIRS_PQ_P55_GAPQ01-032_114_075_20140804
|-- md5sum.txt
|-- metadata.xml
`-- scene01
|-- ACCA_CLOUD_SHADOW_LOGFILE.txt
|-- ACCA_LOGFILE.txt
|-- FMASK_CLOUD_SHADOW_LOGFILE.txt
|-- FMASK_LOGFILE.txt
`-- LS8_OLI_TIRS_PQ_P55_GAPQ01-032_114_075_20140804_1111111111111100.tif

### GA Fractional Cover

Example LS5, 7 and 8:

.
|-- LS5_TM_FC_P54_GAFC01-002_106_080_19980901
| |-- LS5_TM_FC_P54_GAFC01-002_106_080_19980901.jpg
| |-- md5sum.txt
| |-- metadata.xml
| `-- scene01
| |-- LS5_TM_FC_P54_GAFC01-002_106_080_19980901_BS.tif
| |-- LS5_TM_FC_P54_GAFC01-002_106_080_19980901_NPV.tif
| |-- LS5_TM_FC_P54_GAFC01-002_106_080_19980901_PV.tif
| `-- LS5_TM_FC_P54_GAFC01-002_106_080_19980901_UE.tif
|-- LS7_ETM_FC_P54_GAFC01-002_115_078_20140819
| |-- LS7_ETM_FC_P54_GAFC01-002_115_078_20140819.jpg
| |-- md5sum.txt
| |-- metadata.xml
| `-- scene01
| |-- LS7_ETM_FC_P54_GAFC01-002_115_078_20140819_BS.tif
| |-- LS7_ETM_FC_P54_GAFC01-002_115_078_20140819_NPV.tif
| |-- LS7_ETM_FC_P54_GAFC01-002_115_078_20140819_PV.tif
| `-- LS7_ETM_FC_P54_GAFC01-002_115_078_20140819_UE.tif
`-- LS8_OLI_TIRS_FC_P54_GAFC01-032_115_075_20140827
|-- LS8_OLI_TIRS_FC_P54_GAFC01-032_115_075_20140827.jpg
|-- md5sum.txt
|-- metadata.xml
`-- scene01
|-- LS8_OLI_TIRS_FC_P54_GAFC01-032_115_075_20140827_BS.tif
|-- LS8_OLI_TIRS_FC_P54_GAFC01-032_115_075_20140827_NPV.tif
|-- LS8_OLI_TIRS_FC_P54_GAFC01-032_115_075_20140827_PV.tif
`-- LS8_OLI_TIRS_FC_P54_GAFC01-032_115_075_20140827_UE.tif

29 changes: 29 additions & 0 deletions agdc/ingest/modis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@

Modis Ingestion
===============

Ingest Modis datasets in NetCDF format.

Example Usage
-------------

agdc-ingest-modis -C agdc-test-environment.conf \
--source /g/data/v10/projects/ingest_test_data/input/modis

Datasets
--------

The modis ingester searches for any files matching `MOD\*.nc`.

Example source:

.
`-- 2010365
|-- MOD09_L2.2010365.0135.20130130162407.remapped_swath_500mbands_0.005deg.nc
|-- MOD09_L2.2010365.0315.20130130162407.remapped_swath_500mbands_0.005deg.nc
|-- MOD09_L2.2010365.2300.20130130162407.remapped_swath_500mbands_0.005deg.nc
|-- orbit_58697.logs.tgz
|-- orbit_58698.logs.tgz
`-- orbit_58710.logs.tgz

28 changes: 28 additions & 0 deletions agdc/ingest/wofs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@

Water Observations from Space (WOfS) Ingestion
==============================================

The WOfS ingester differs from Landsat and Modis, in that it ingests tiles output
from the Data Cube rather than from external sources.

These outputs are already in the correct projection and tile bounds, and so can be ingested in-place
without modification. The ingester indexes the datasets, but does not copy them, and so the source
location is expected to be permanent.

Example Usage
-------------

agdc-ingest-wofs -C agdc-test-environment.conf \
--source /g/data/v10/projects/ingest_test_data/wofs

Datasets
--------

The WOfS ingester searches for any files matching `\*_WATER_\*.tif`

Example:

.
└── LS7_ETM_WATER_154_-026_2012-05-18T23-36-13.518391.tif

3 changes: 1 addition & 2 deletions database/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Database migrations
# AGDC Databases

## Initial setup

Expand Down

0 comments on commit 1ac2c4a

Please sign in to comment.