Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduces the notion of forfeited job #49

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

polus-arcticus
Copy link
Contributor

@polus-arcticus polus-arcticus commented Jun 5, 2024

Currently all errors that arise from the execution layer are treated as errors belonging to the computation being requested. One cannot guarantee execution of a faulty program after all. However, without assumptions of network consistency, it is totally possible that errors arise belonging to the resource provider. Hardware can break, data centers can lose power, etc.

Under the current paradigm the job creator needs to wait for timeouts to expire on the job, find the job id, and send a new tx to recover the funds. And there is no coophive recover command yet. In the case of long time out periods for long jobs (say a couple months) it would be a lot nicer if the resource provider could just say 'hey i cant run your job anymore, here is a refund`

or if the job creator can say, 'hey thanks for running so far, but i dont need it anymore, lets cancel now and prorate the computation completed.

	"DealNegotiating",
	"DealAgreed",

         "DealForfeited" // This new state can be evoked while in the DealAgreed phase, but not after

	"ResultsSubmitted",
	"ResultsAccepted",
	"ResultsChecked",
	"MediationAccepted",
	"MediationRejected",
	"TimeoutSubmitResults",
	"TimeoutJudgeResults",
	"TimeoutMediateResults",

Payment is refunded to job creator
Resource Provider receives collateral back

@lukemarsden
Copy link
Contributor

Hmm, this is an interesting idea. However, I think a lot of our current problems would be solved by simply catching errors better and returning them to the user in the results bundle. Also, interpreting errors in the results bundle in the CLI and presenting them as actual errors. Maybe we could do that within the context of the current smart contracts etc, rather than changing the protocol itself?

@lukemarsden
Copy link
Contributor

I guess what I described is complicated by the fact that if bacalhau fails to run the job, we don't have a CID to return. Maybe we need another field in the result type to include an error message instead of a CID?

@polus-arcticus
Copy link
Contributor Author

polus-arcticus commented Jun 6, 2024

our current problems would be solved by simply catching errors better and returning them to the user in the results bundle
what I described is complicated by the fact that if bacalhau fails to run the job, we don't have a CID to return.

Indeed, our error in question is here https://github.com/CoopHive/coophive/blob/2e88aedce706158c4cb46176e07e2c2a2746cc1c/pkg/resourceprovider/controller.go#L422-L425

I suppose the question is how to handle errors that are on the resource providers machine, but not in bacalhau, and ones on the machine, and also inside bacalhau. Since they both show up here in that permalink without really any indication which is which. One will pipe stderr to the CID fine, one will do the blank string.

Maybe we need another field in the result type to include an error message instead of a CID?

In the permalink above i can change the forfeit logic to stop the spinner, and provide an error like 'the resource provider had trouble starting your job, your payment and collateral has been disbonded' type thing

@polus-arcticus polus-arcticus requested a review from mactus13 June 26, 2024 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants