-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate IPFS as default publisher #3862
Comments
@wdbaruni for a user who may have signed up via their organisation's expanso-managed account, the publisher would most likely be defined organisation-wide. Previously when I've been through security audits, a lot of the concerns are that data might be exfiltrated from the cluster and in those situations it would potentially be useful for us to disallow users from specifying their own publisher, and rely solely on the organisation's configured publisher. |
IPFS is no longer the default publisher, the default being Noop but settable. Any chances we want to the local publisher we can put into other tickets. |
IPFS node still embedded, re-opening to make sure we remove it |
Have updated #3816 with recommendation for making this change over two releases. |
I am going to close this issues as IPFS as a default publisher has been deprecated in favor of the local publisher. The remaining items of work can be completed outside of this issue that has grown into an epic. |
The Problem
Today, when users run
bacalhau serve
, an embedded and private ipfs node is also created within bacalhau process. The original intention to do this was for people to easily test bacalhau and to be self contained. Though it was not intended for production use or actual use outside of just trying out bacalhau. The recommended way was for people to run their own ipfs node outside of bacalhau, and connect to it usingbacalhau serve --ipfs-connect <addr>
In addition to that, when users submit a job using
bacalhau docker run
without defining--publisher
flag, the CLI will useipfs
as the default publisher and the job will fail if the compute node does not support ipfs.There are many problems with our current approach, including:
--ipfs-connect
, but no value to introduce ipfs to fresh new users as the default and recommended storage for bacalhauThe Proposal
bacalhau job logs
to get the output (I understandlogs
need some improvements). If they want to publish results to a remote destination, then they can select theuri
where to publish the results (e.g.s3://...
,ipfs://
), and the job will be routed to the compatible compute nodes. This means thatbacalhau serve
without any flags or configurations will support jobs that don't publish results to remote destinations, which is more than enough for testing out bacalhau. In the future, we discussed an improvement where the requester node can populate job defaults such that the requester can set a default publisher to be some s3 bucket, and now users don't need to specify the remote destination anymore in their job submissionlocal
publisher that keeps results locally on the compute nodes, and make them accessible through a simple HTTP based API to send local results from compute nodes directly back to the client. This should enablebacalhau serve
to work out of the box, and forbacalhau get
to be able to retrieve results not published to stdout. The critical point that I am hoping to make clear is this should only be for testing purposes and to simplify trying out bacalhau, and shouldn't be used for actual workloads. Bacalhau is not a storage service, we don't do replication, storage monitoring, or moving data around when a node is shutting down. To avoid doing the same mistake with embedded ipfs becoming more than for testing purposes, we should:bacalhau get
when data is usinglocal
storageI've discussed this with @aronchick before, and he wasn't inclined with the first option. I am putting both options here for completeness and to make sure the tradeoffs are clear.
Priority
Tasks
The text was updated successfully, but these errors were encountered: