Skip to content
This repository has been archived by the owner on Aug 28, 2024. It is now read-only.

Refactor to match current design doc #109

Merged
merged 18 commits into from
Jul 21, 2023
Merged

Refactor to match current design doc #109

merged 18 commits into from
Jul 21, 2023

Conversation

nstogner
Copy link
Contributor

@nstogner nstogner commented Jul 18, 2023

  • Refactor cloud switch conditions into interfaces within the cloud pkg
  • Update all bucket URLs to use hashes
  • Add support for params.json (fixes Support for params.json #114)

@nstogner nstogner self-assigned this Jul 18, 2023
@nstogner nstogner changed the title WIP: Refactor to match current design doc Refactor to match current design doc Jul 19, 2023
Makefile Outdated Show resolved Hide resolved
Container string // Example: trainer, loader
Name string // Example: model, model-saved, data
Mounts []BucketMount // Example: model, data, logs
ReadOnly bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this type correspond to a single fs mount or all of them on the container? Mounts has me think it captures all of them but ReadOnly should probably only apply to a single mount point and not all right?

Copy link
Contributor

@samos123 samos123 Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is why it works, only if dataset is nil `r.Cloud.MountBucket gets called another time with a single additional single BucketMount which adds a readonly mount for dataset. See example below:

	if err := r.Cloud.MountBucket(&job.Spec.Template.ObjectMeta, &job.Spec.Template.Spec, model, cloud.MountBucketConfig{
		Name: "model",
		Mounts: []cloud.BucketMount{
			{BucketSubdir: "model", ContentSubdir: "model"},
			{BucketSubdir: "logs", ContentSubdir: "logs"},
		},
		Container: containerName,
		ReadOnly:  false,
	}); err != nil {
		return nil, fmt.Errorf("mounting model: %w", err)
	}

	if dataset != nil {
		if err := r.Cloud.MountBucket(&job.Spec.Template.ObjectMeta, &job.Spec.Template.Spec, dataset, cloud.MountBucketConfig{
			Name: "dataset",
			Mounts: []cloud.BucketMount{
				{BucketSubdir: "data", ContentSubdir: "data"},
			},
			Container: containerName,
			ReadOnly:  true,
		}); err != nil {
			return nil, fmt.Errorf("mounting dataset: %w", err)
		}
	}

	if baseModel != nil {
		if err := r.Cloud.MountBucket(&job.Spec.Template.ObjectMeta, &job.Spec.Template.Spec, baseModel, cloud.MountBucketConfig{
			Name: "basemodel",
			Mounts: []cloud.BucketMount{
				{BucketSubdir: "model", ContentSubdir: "saved-model"},
			},
			Container: containerName,
			ReadOnly:  true,
		}); err != nil {
			return nil, fmt.Errorf("mounting base model: %w", err)
		}
	}

I do think it's a bit confusing since both Brandon and I were similarly confused by this.

edit: Below was my original response:

nice catch. I think ReadOnly needs to move to BucketMount type. Otherwise how would you express this?

model(readonly=False), model-saved(readonly=True), data(readonly=True)

The code further down makes me believe this might be a bug indeed: https://github.com/substratusai/substratus/pull/109/files#diff-a1c985bf4c51fa451b98fc1431485adc3910b38bf897053cd4a881323083008fR66

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCSFuse allows for specifying readonly on an entire volume. The reason why I have multiple .volumes being appended is: I wanted to respect the bucket url reported by a given object instead of assuming that all objects in a given cluster are stored in the same bucket. One nice side effect of this: a Dataset object could eventually be imported into a GKE cluster that references a dataset from S3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR on rational: Every time MountBucket is called it is given an Object to mount from. If this Object has a .status.artifacts.url, MountBucket will respect this url. This means MountBucket is 1:1 with a new .volume. So the context of the .ReadOnly field is OK to be scoped at the .volumes level as well as the .volumeMounts level.

Copy link
Contributor Author

@nstogner nstogner Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably rename to type ObjectBucketMounts struct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation - makes good sense now!

Copy link
Contributor

@brandonjbjelland brandonjbjelland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions worth addressing but overall this is huge and like 95% there

@brandonjbjelland brandonjbjelland self-requested a review July 20, 2023 23:59
@samos123 samos123 self-requested a review July 21, 2023 01:19
samos123
samos123 previously approved these changes Jul 21, 2023
@samos123
Copy link
Contributor

merging and then tagging and releasing v0.5.0

@samos123 samos123 merged commit a94bc63 into main Jul 21, 2023
2 checks passed
@samos123 samos123 deleted the update-to-match-design branch July 21, 2023 05:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for params.json
3 participants