Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COG physical arrangements, best ways to decode/cache and encode other data (vector ?) #7

Open
Farkal opened this issue Jul 13, 2020 · 0 comments

Comments

@Farkal
Copy link

Farkal commented Jul 13, 2020

I think you should also add a scheme on the COG physical arrangements. Here is what i have found on https://www.fileformat.info/format/tiff/egff.htm
image

tiff_architecture.drawio.tar.gz

I added the 4 based on what i have read on this spec. I also see that you add an area for Values of TIFF tags that don't fit inline in the IFD directory, such as TileOffsets, TileByteCounts and GeoTIFF keys but there is an issue asking if this area is before all the IFD or after each IFD. The new solution is also proposed to be more efficient and i think there should be more informations about the best ways to decode/cache a cog file.
Here is a guideline to get a tile from X, Y and Z (i can make a PR to add it on the website or to the spec):

Decoding a tile (X, Y, Z):

  • First request of 1024 bytes
  • Decode the IFD in memory
  • If we can't read to the end of an IFD make a new request based on the IFD offset multiplied by the entry count of this IFD
  • We can get the IFD corresponding to Z by matching tile matrix resolution and the full resolution image
  • With the corresponding IFD we can get the offset and byte counts index with Y * (ImageWidth / TileLength) + X
  • Make the http request between TileOffsets[index] and TileOffsets[index] + TileByteCounts[index]
  • Cache the IFD structure in memory

With the caching we reduce the number of request for the other tiles of the same cog. But if the image have a very large resolution we will saturate our memory.
So i think the best architecture should be to put the TileOffsets and TileByteCounts after all the IFDs. Also it would be great to have some information on the header size with a tag on the first IFD. So we should be able to make a maximum of two requests to get all the IFDs: First request of 1024 bytes and if Tag::IFDTotalSize > 1024 make another request to Tag::IFDTotalSize.
Store TileOffsets and TileByteCounts in memory only if their size is not too large.
For all the other request of the same cog if we have the TileOffsets and TileByteCounts in memory we can get the tile with one request else we need 2 request.

We could also add some guidelines to add other dimensions. Today i don't know what is the best way to add another dimension to the a cog (for example altitude). Also how can we store vector in cog ? On https://www.fileformat.info/format/tiff/egff.htm the author wrote

TIFF files contain only bitmap data, although adding a few tags to support vector- or text-based images would not be a hard thing to do.

There is multiple projects trying to define some spec for vector (for ex: https://github.com/planetfederal/cogj-spec) and by just adding some tag we should be able to manage vector and get more simpler processing chain for all geographic data.
I think this spec should be a place to propose and discuss about best implementations for all these uses cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant