-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create IPIP with Gateway spec for partial CAR exports #348
Comments
I'm not personally a huge fan of |
I like the idea of it being a CID! |
|
|
My understanding is that this requires basically stored context on the node you are retrieving from, so is more like extra state for resuming a broken connection than resumable selectors.
IMO range requests for CAR files seems like an iffy thing to support on gateways. In the general case they're costly to create and so asking for bytes 1000MB-1001MB of a CAR file seems like a small request but in reality is very costly on the server, since clients and servers may be run and developed by different parties it wouldn't be great to encourage client developers to build tooling around range requests. Sometimes they're a good idea, for example IIUC https://github.com/filecoin-project/boost/ plans to allow for ingesting data as CAR files with range requests. However, IIUC they have a few benefits
However, I suspect in our case having range requests all the time is a bad idea and having it only some of the time is more likely to cause confusion than not. I'm by no means an expert in the various HTTP tools that exist out there though, so maybe this "sometimes range request" pattern is common enough to be worth supporting.
I don't know that I'd do this long before we put other limits on gateway usage like not downloading 100GB files over public gateways. If we want to allocate some configurable size budget for CAR + UnixFS downloads though that sounds pretty sane to me. Yes, we should definitely have some recipes of common selectors or patterns of use. It's going to be a whole new way of people accessing data and therefore of confusing people. It's possible a few will be so common that it'll be worth considering aliasing them to something easier to read in a URL bar.
This mostly makes sense to me, although there are a few footguns I think we should watch out for here. These aren't blockers and people will hopefully do mostly sane things, but IMO when writing new specs here it's better not to leave too much undefined as then you start having to assume the worst case scenario everywhere.
Perhaps off topic and related to ipfs/in-web-browsers#182, and if so lmk and we can resume there. @lidel this issue mentions CAR export with a selector like
|
Agree, there is dangerous resource usage asymmetry here, and no clear benefit when compared to progressive download with shallow selectors. I updated ipfs/kubo#8758 – it now returns CAR stream with
Yep, added to the TBD scope, we may extract it to separate issue.
yes
no reason to restrict. as we discussed earlier this week, TBD if we want to allow that in this mvp, or add later. |
This turns out to be more involved, as we are lacking support for Blocked until we have dag-cbor and dag-json support story cleaned up in |
I'm working on a project that will want to use this work around verifiable gateway responses. From the discussion above, am I to understand that resuming downloads of CARs will require parsing the CAR as its downloading, keeping track of the CIDs we want but have yet to receive, then, if the download is interrupted, constructing a new request containing the missing CIDs in a selector? Especially in the low-powered servers use case, download resumption is going to be important, and if the CAR is to be served with |
there's some work ongoing for more ergonomic selectors to support parts of this. There's recently been selector support added for representing the blocks that constitute a range of a unixfs file. @hannahhoward - do you have thoughts on where in go-ipfs we need to respect the unixfs reifier / LargeBytes feature detection to get get the same behavior as in graphsync? |
In my mind, CAR resumption will not be sending the same request again. The idea is for the client to be smart to import as many blocks as possible, and then send follow-up requests for DAG branches which are missing. |
Dropping some notes after IPFS Thing 2022:
I am afraid this is blocked until we figure out some unified UX strategy for IPLD signaling (selectors, ADLs). |
A very relevant proposal was presented by @hannahhoward today during 5th Move the Bytes Call.
|
Re: detecting truncated CAR stream, there was a proposal to use CARv2 instead of CARv1, below details so we avoid revisiting it:
|
As part of Project Rhea, this is critical for improving performance when working with untrusted nodes so we can do better than requesting block-by-block. Initial design is happening in https://www.notion.so/pl-strflt/HTTP-Gateway-Requests-for-Graphs-as-CARs-001d2a9f5a35418bb0fb7d9d182d24ec?d=8d44d17f00344834b9b72798ca1ea117 |
Context
ipfs/kubo#8758 adds support for CAR export via Gateway.
It exports entire dag as a CAR stream, which does not cover all use cases.
For example, thin clients may want to export unixfs directory root block + its immediate children, or progressively fetch a big DAG from multiple gateway endpoints.
Why we need selector support
Scope
Proposed design (A) 💢
The go-car library supports passing selectors, the idea is to add a parameter to do just that.
We have to URL-escape selector somehow, either way,
so the choice is between encodeURIComponent and multibase encoding:
Text (JSON) representation:
Binary (CBOR) representation:
Proposed design (B) 💢
Here
{cid2}
is a CID representing selector data. It could be dag-cbor, dag-json.Small ones could be inlined (with identity hash), bigger ones could be fetched once and reused efficiently.
Proposed design (C) 🤏
Over time, we realized this is the most useful and safest way.
No selector CIDs, only predefined, most useful "partial CAR export scope" parameters for now:
depth=1
means "root+direct children only" – good for fetching UnixFS dir listing with file sizes / types, or splitting bigger DAGs into partial retrievals over multiple gateways / threadswith-path
will also include blocks for all parent nodes on the content path (/ipfs/{cid}/some/subpath
,/ipfs/{cid}/some
, and/ipfs/{cid}
) – allows light clients to save round trips and take everything in single request-response.leaves
andbytes
proposed by Hannah Create IPIP with Gateway spec for partial CAR exports #348 (comment)Proposed design (D) 🙏
Better ideas would be really welcome here 👀
Please comment below.
My initial thought was to have "single way of passing selectors", but if you find each approach brings value to different use cases, we could support both.
👉 NOTE: whatever we come up with here, we most likely want Kubo to support the same convention in
ipfs dag
CLI (and RPC API at/api/v0/dag/*
) – see ipfs/kubo#8239The text was updated successfully, but these errors were encountered: