Skip to content

Commit

Permalink
Merge pull request #7 from developmentseed/feat/rename-and-restructure
Browse files Browse the repository at this point in the history
Feat/rename and restructure
  • Loading branch information
abarciauskas-bgse authored Aug 14, 2023
2 parents e5fbd9a + f0e473a commit 0f957a6
Show file tree
Hide file tree
Showing 21 changed files with 469 additions and 99 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Cloud-Optimized Data Guide (In development)
# Cloud-Optimized Geospatial Formats Guide

See the site [https://developmentseed.org/cloud-optimized-data-guide/](https://developmentseed.org/cloud-optimized-data-guide/)
See the site [https://developmentseed.org/cloud-optimized-geospatial-formats-guide/](https://developmentseed.org/cloud-optimized-geospatial-formats-guide/)

This site is built using [Quarto](https://quarto.org/docs/get-started/)
To preview the site locally, install quarto and run:

```sh
quarto render --to html
quarto preview
```
43 changes: 18 additions & 25 deletions _quarto.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
project:
type: website
preview:
port: 4200
browser: false

website:
page-navigation: true
title: 👷 IN DEVELOPMENT - Cloud-Optimized Data Guide
repo-url: https://github.com/developmentseed/cloud-optimized-data-guide
title: Cloud-Optimized Geospatial Formats Guide
repo-url: https://github.com/developmentseed/cloud-optimized-geospatial-formats-guide
repo-actions: [edit, issue]

page-footer:
Expand All @@ -16,37 +19,27 @@ website:
align: center
tools:
- icon: github
href: https://github.com/NASA-IMPACT/cloud-optimized-data-guide
text: "Cloud-Optimized Data Guide"
href: https://github.com/NASA-IMPACT/cloud-optimized-geospatial-formats-guide
text: "Cloud-Optimized Geospatial Formats Guide"

style: "docked"
search: true
collapse-level: 1
collapse-level: 2
contents:
- href: index.qmd
text: Welcome
- section: Formats for N-Dimensional Gridded Data
- href: overview.qmd
text: Overview Slides
- section: Formats
contents:
- section: Cloud-Optimized GeoTIFFs (COGs)
- section: cloud-optimized-geotiffs.ipynb
contents:
- cloud-optimized-geotiffs/guide.ipynb
- section: Zarr + Kerchunk
contents:
- text: Zarr + Kerchunk
- section: Cloud-Optimized HDF5 + NetCDF
contents:
- text: Cloud-Optimized HDF5 + NetCDF
- section: Formats for Point and Vector data
contents:
- section: Cloud-Optimized Point Clouds (COPC)
contents:
- text: Cloud-Optimized Point Clouds (COPC)
- section: GeoParquet
contents:
- text: GeoParquet
- section: Flatgeobuf
contents:
- text: Flatgeobuf
- cogs-examples.ipynb
- section: Zarr and Kerchunk
- section: Cloud-Optimized HDF5 and NetCDF
- section: Cloud-Optimized Point Clouds (COPC)
- section: GeoParquet
- section: Flatgeobuf

format:
html:
Expand Down
91 changes: 91 additions & 0 deletions cloud-optimized-geotiffs.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e84fbc0b",
"metadata": {},
"source": [
"# Cloud-Optimized GeoTIFFs\n",
"\n",
"## What is a Cloud-Optimized GeoTIFF?\n",
"\n",
"Cloud-Optimized GeoTIFF (the COG) is a variant of the TIFF image format that specifies a particular layout of internal data in the GeoTIFF specification to allow for optimized (subsetted or aggregated) access over a network for display or data reading. The key components are overviews, and internal tiling.\n",
"\n",
"For more details see https://www.cogeo.org/\n",
"\n",
"<img alt=\"COG Diagram\" src=\"./images/cog-diagram-1.png\" width=300/>\n",
"\n",
"### Dimensions and Internal Blocks\n",
"\n",
"This attribute is also sometimes called **chunks** or **internal tiles**.\n",
"\n",
"Dimensions are the number of bands, rows and columns stored in a GeoTIFF. There is a tradeoff between storing lots of data in one GeoTIFF and storing less data in many GeoTIFFs. The larger a single file, the larger the GeoTIFF header and the multiple requests may be required just to read the spatial index before data retrieval. The opposite problem occurs if you make too many small files, then it takes many reads to retrieve data, and when rendering a combined visualization can greatly impact load time.\n",
"\n",
"If you plan to pan and zoom a large amount of data through a tiling service in a web browser, there is a tradeoff between 1 large file, or many smaller files. The current recommendation is to meet somewhere in the middle, a moderate amount of medium files.\n",
"\n",
"### Internal Blocks\n",
"\n",
"Internal blocks are required if the dimensions of data are over 512x512. However you can control the size of the internal blocks. 256x256 or 512x512 are recommended. When displaying data at full resolution, or doing partial reading of data this size will impact the number of reads required. A size of 256 will take less time to read, and read less data outside the desired bounding box, however for reading large parts of a file, it may take more total read requests. Some clients will aggregate neighboring block reads to reduce the total number of requests.\n",
"\n",
"### Overviews\n",
"\n",
"Overviews are downsampled (aggregated) data intended for visualization.\n",
"The best resampling algorithm depends on the range, type, and distribution of the data.\n",
"\n",
"The smallest size overview should match the tiling components’ fetch size, typically 256x256. Due to aspect ratio variation just aim to have at least one dimension at or slightly less than 256. The COG driver in GDAL, or rio cogeo tools should do this.\n",
"\n",
"There are many resampling algorithms for generating overviews. When creating overviews several options should be compared before deciding which resampling method to apply."
]
},
{
"cell_type": "markdown",
"id": "eb04e257-322a-4d42-b6d6-e75f1c587a69",
"metadata": {},
"source": [
"## How to create and validate COGs\n",
"\n",
"1. [Rio-cogeo: GitHub - cogeotiff/rio-cogeo: Cloud Optimized GeoTIFF creation and validation plugin for rasterio](https://github.com/cogeotiff/rio-cogeo)\n",
"2. [Gdal: COG – Cloud Optimized GeoTIFF generator — GDAL documentation](https://gdal.org/drivers/raster/cog.html)"
]
},
{
"cell_type": "markdown",
"id": "d193ab02-bb69-455e-9b72-5b89728f086e",
"metadata": {},
"source": [
"## Additional Resources\n",
"\n",
"* [Planet Blog: An Introduction to Cloud Optimized GeoTIFFS (COGs) Part 1: Overview](https://developers.planet.com/docs/planetschool/an-introduction-to-cloud-optimized-geotiffs-cogs-part-1-overview/)\n",
"* [COG Talk — Part 1: What’s new?](https://medium.com/devseed/cog-talk-part-1-whats-new-941facbcd3d1)\n",
"* [Development Seed Blog: Do you really want people using your data?](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f)\n",
"\n",
"## How to visualize COGs\n",
"\n",
"* GDAL vis* drivers (vsicurl, vsis3, vsiaz,)\n",
"* Titiler https://github.com/developmentseed/titiler\n",
"* Rio-viz https://github.com/developmentseed/rio-viz"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:geospatial]",
"language": "python",
"name": "conda-env-geospatial-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
68 changes: 2 additions & 66 deletions cloud-optimized-geotiffs/guide.ipynb → cogs-examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,71 +5,7 @@
"id": "e84fbc0b",
"metadata": {},
"source": [
"# Guide for Creating and Testing Cloud-Optimized GeoTIFFs\n",
"\n",
"This is a living document, open for contributions from and to be shared with the open geospatial community. Please add to any section you feel compelled to contribute questions or answers to.\n",
"\n",
"The document is broadly organized into “What we know”, “What we don’t know”, Links to examples for generating or using Cloud-Optimized GeoTIFFs (COGs) and an open question and answer section.\n",
"\n",
"## What we know\n",
"\n",
"Cloud-Optimized GeoTIFF is a variant of the TIFF image format that specifies a particular layout of internal data in the GeoTIFF specification to allow for optimized (subsetted or aggregated) access over a network for display or data reading. The key components are overviews, and internal tiling.\n",
"\n",
"For more details see https://www.cogeo.org/\n",
"\n",
"<img alt=\"COG Diagram\" src=\"../images/cog-diagram-1.png\" width=300/>"
]
},
{
"cell_type": "markdown",
"id": "eb04e257-322a-4d42-b6d6-e75f1c587a69",
"metadata": {},
"source": [
"# Tools for working with COGs\n",
"\n",
"## How to create and validate COGs\n",
"\n",
"1. [Rio-cogeo: GitHub - cogeotiff/rio-cogeo: Cloud Optimized GeoTIFF creation and validation plugin for rasterio](https://github.com/cogeotiff/rio-cogeo)\n",
"2. [Gdal: COG – Cloud Optimized GeoTIFF generator — GDAL documentation](https://gdal.org/drivers/raster/cog.html)"
]
},
{
"cell_type": "markdown",
"id": "75f9efd9-91fa-4142-bbde-6358a65f7c9e",
"metadata": {},
"source": [
"# Questions to ask when generating COGs\n",
"\n",
"1. What variable(s) should be included in the COG?\n",
"2. Will you create multi-band COGs with variables or a host of single band COGs with variable naming conventions?\n",
"3. What is the intended use case or usage profile? Will these COGs be used for visualization, analysis or both?\n",
"4. What is the expected access method?\n",
"5. How much of your data is typically rendered or selected at once? All to very select subsets?\n",
"b"
]
},
{
"cell_type": "markdown",
"id": "21267789-f1b8-4e91-bcf5-460721af6b09",
"metadata": {},
"source": [
"## How to visualize COGs\n",
"\n",
"* GDAL vis* drivers (vsicurl, vsis3, vsiaz,)\n",
"* Titiler https://github.com/developmentseed/titiler\n",
"* Rio-viz https://github.com/developmentseed/rio-viz\n",
"\n",
"## How to process and subset COGs\n",
"\n",
"* [stackstac](https://stackstac.readthedocs.io/en/latest/)\n",
"* [rasterio](https://rasterio.readthedocs.io/)\n",
"* [rio-tiler](https://github.com/cogeotiff/rio-tiler)\n",
"* [geotiff.js](https://geotiffjs.github.io/)\n",
"\n",
"## How to catalog COGs\n",
"\n",
"* [rio-stac](https://developmentseed.org/rio-stac/intro/)\n",
"* [TomAugspurger/xstac: STAC from xarray](https://github.com/stac-utils/xstac)\n"
"# Examples of Working with COGs"
]
},
{
Expand Down Expand Up @@ -135,7 +71,7 @@
"source": [
"## Download a GeoTIFF from EarthData\n",
"\n",
"Note: The whole point of cloud-optimized data is that we _don't_ download data. So in future examples, we will demonstrate how to access just subsets of data using COG and compare that with a GeoTIFF."
"Note: The whole point of is that we _don't_ download data. So in future examples, we will demonstrate how to access just subsets of data using COG and compare that with a GeoTIFF."
]
},
{
Expand Down
10 changes: 10 additions & 0 deletions contributing.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Contributing
subtitle: Guidelines for contributing
---

We welcome contributions to this guide. If you wish to contribute, please consider the following tenets:

* All examples should use open data. If an example uses Earthdata, it must include an example of how to provide credentials.
* Landing pages with no code should be use [quarto markdown (`.qmd`)](https://quarto.org/docs/authoring/markdown-basics.html).
* Pages with executable code should be [iPython Notebooks (`.ipynb`)](https://ipython.org/notebook.html)
13 changes: 13 additions & 0 deletions custom.scss
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/*-- scss:defaults --*/

// fonts
$presentation-font-size-root: 27px !default;

// colors
$body-color: #000 !default;
$selection-bg: #26351c !default;

// headings
$presentation-heading-color: #333 !default;

/*-- scss:rules --*/
Binary file added images/2019-points-lines-polygons.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/cog-overviews.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/copc-vlr-chunk-table-illustration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/fgb_diagram_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/gpq_query_window.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/higher-level-libraries.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/multi_refs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/tile-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/type-format-support-matrix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/xarray-datastructure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 33 additions & 5 deletions index.qmd
Original file line number Diff line number Diff line change
@@ -1,12 +1,40 @@
---
title: "Cloud-Optimized Data Guide"
subtitle: "Methods for Generating and Testing Cloud-Optimized Data"
title: "Cloud-Optimized Geospatial Formats Guide"
subtitle: "Methods for Generating and Testing Cloud-Optimized Geospatial Formats"
---

The audience for this guide is any data provider wishing to make their data accessible without users needing to download entire files.
# Who this guide is for

If you wish to provide optimized access geospatial data, this guide is for you. Given the size of geospatial data, now and in the future, users can no longer rely on file download to achieve their science goals.

## Built for the community, by the community.

There is no one-size-fits-all approach to cloud-optimized data, but the community has developed many tools for creating and assessing cloud-optimized data formats that should be organized and shared.
There is no one-size-fits-all approach to cloud-optimized data, but the community has developed many tools for creating and assessing geospatial formats that should be organized and shared.

With this guide, we provide the landscape of cloud-optimized geospatial formats and provide the best-known answers to common questions.

## The Opportunity

Just putting data on the cloud does not solve the big geospatial data problem. Massive archives of data must be available via subsetting services in order for users to work with the data in-memory. Traditional geospatial formats are optimized for on-disk access via small internal chunks. The introduction of a network introduces latency and the number of requests must considered. The file format must support subsetted access via adressable chunks, internal tiling or both. These characteristics allow for parallelized and partial reading.

## Table of Contents

1. [Overview of Formats (slideshow)](./overview.qmd)
2. Formats
a. [Cloud-Optimized GeoTIFFs](./cloud-optimized-geotiffs.ipynb)
b. Zarr and Kerchunk - COMING SOON
c. Cloud-Optimized HDF5 - COMING SOON
d. Geoparquet - COMING SOON
e. Flatgeobuf - COMING SOON
3. Cookbooks
a. [Zarr Visualization Cookbook - IN DEVELOPMENT](https://nasa-impact.github.io/zarr-visualization-cookbook/)

## Questions to ask when generating cloud-optimized geospatial data in any format

1. What variable(s) should be included in the new data foramt
2. Will you create copies to optimize for different needs?
3. What is the intended use case or usage profile? Will this product be used for visualization, analysis or both?
4. What is the expected access method?
5. How much of your data is typically rendered or selected at once? All to very select subsets?


With this guide, we provide the landscape of cloud-optimized data formats and provide the best-known answers to common questions.
Loading

0 comments on commit 0f957a6

Please sign in to comment.