Cumulus is an open source cloud-based data ingest, archive, distribution, and management framework developed for NASA's future Earth Science data streams. This repo supports the development, deployment, and testing of Cumulus and supplies useful tips on configuration, workflow management, and operations. To learn more about Cumulus and NASA's Earth Observing System Data and Information System (EOSDIS) cloud initiatives go to More Information.
Below is in-depth guidance to help get you started with your Cumulus development. To get a quick start on Cumulus deployment go to our Getting Started section.
- Documentation for the latest released version.
- Documentation for the unreleased work.
- Documentation: How To's when serving and updating the documentation.
The Cumulus core repo is a monorepo managed by Lerna. Lerna is responsible for installing the dependencies of the packages and tasks that belong in this repo. In general, Cumulus's npm packages can be found in the packages directory, and workflow tasks can be found in the tasks directory.
To help cut down on the time and disk space required to install the dependencies
of the packages in this monorepo, all devDependencies
are defined in the
top-level package.json. The
Node module resolution algorithm
allows all of the packages and tasks to find their dev dependencies in that
top-level node_modules
directory.
TL;DR - If you need to add a devDependency
to a package, add it to the
top-level package.json file, not the package.json
associated
with an individual package.
This is for installation for Cumulus development. See the Cumulus deployment section for instructions on deploying the released Cumulus packages.
- NVM and node version 16.19.0
- AWS CLI
- Bash
- Docker (only required for testing)
- docker-compose (only required for testing
pip install docker-compose
) - Python 3.10
- pipenv
You may use
brew
to install the prerequisites. Visit Homebrew documentation for guidance.
Install the correct node version:
nvm install
nvm use
We use Lerna to manage multiple Cumulus packages in the same repo. You need to install Lerna as a global module first:
npm install -g lerna
We use npm for local package management. Run the following to get your dependencies set up.
npm install
npm run bootstrap
Build all packages:
npm run build
Build and watch packages:
npm run watch
To add new packages go to Adding New Packages for guidance.
Start the API:
npm run serve
Or start the distribution API:
npm run serve-dist
See the API package documentation for more options.
LocalStack provides local versions of most AWS services for testing.
The LocalStack repository has installation instructions.
Localstack is included in the docker-compose file. You only need to run the docker-compose command in the next section in order to use it with your tests.
Turn on the docker containers first:
npm run start-unit-test-stack
Stop localstack/unit test services:
npm run stop-unit-test-stack
npm run db:local:migrate
The tests can be run against an Elasticsearch server running in AWS. This is useful if you are using an ARM-equipped Mac and are unable to run the old Intel version of Elasticsearch in Docker. These instructions assume that you have a deployment of Cumulus available, and the deployment name is "EXAMPLE".
- The AWS CLI is installed
- The Session Manager plugin for the AWS CLI is installed
- jq is installed
- Your Cumulus deployment specified a
key_name
incumulus-tf/terraform.tfvars
that will grant you access to the EC2 instances that are part of that deployment - You are able to SSH into one of your EC2 instances (you are connected to a NASA VPN if required)
Add the following to your ~/.ssh/config
file
Host i-*
User ec2-user
ProxyCommand sh -c "aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
Open an SSH tunnel to Elasticsearch with the following command.
./bin/es-tunnel.sh EXAMPLE
At this point you can send requests to https://localhost:8443 and get responses from your Elasticsearch domain running in AWS. Note that, because you're tunneling TLS-encrypted traffic, the certificates are not going to match. The test code handles this already but, if you're using curl
, make sure to use the -k
option to disable strict certificate checks.
$ curl -k https://localhost:8443
{
"name" : "ABC123",
"cluster_name" : "123:abc-es-vpc",
"cluster_uuid" : "abc-Ti6N3IA2ULvpBQ",
"version" : {
"number" : "5.3.2",
"build_hash" : "6bc5aba",
"build_date" : "2022-09-02T09:03:07.611Z",
"build_snapshot" : false,
"lucene_version" : "6.4.2"
},
"tagline" : "You Know, for Search"
}
With the tunnel configured, you can now run the tests with the following command:
env \
LOCAL_ES_HOST_PORT=8443 \
LOCAL_ES_HOST_PROTOCOL=https \
LOCAL_ES_HOST=localhost \
LOCALSTACK_HOST=127.0.0.1 \
npm test
Run the test commands next
export LOCAL_ES_HOST=127.0.0.1
export LOCALSTACK_HOST=127.0.0.1
npm test
If tests are working, run coverage tests
export LOCAL_ES_HOST=127.0.0.1
export LOCALSTACK_HOST=127.0.0.1
npm run test:coverage
These tests will fail if coverage drops below certain thresholds or if unit tests fail.
an environment variable can be set to only measure and not threshold
export FAIL_ON_COVERAGE=false
npm run test:coverage
Additionally, you can facilitate updating coverage values with the included coverage script
npm run coverage -- --update
For more information please read this.
Copy the .vscode.example
directory to .vscode
to create your debugger launch configuration. Refer to the VS Code documentation on how to use the debugger.
For more information please read this.
Create a new folder under packages
if it is a common library or create folder under cumulus/tasks
if it is a lambda task. cd
to the folder and run npm init
.
Make sure to name the package as @cumulus/package-name
.
lerna exec -- rm -rf ./package-lock.json
npm run clean
Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for more information.
To release a new version of cumulus read this.
For more information about this project or more about NASA's Earth Observing System Data and Information System (EOSDIS) and its cloud work, please contact Katie Baynes or visit us at https://earthdata.nasa.gov.