Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support reservation #62

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

support reservation #62

wants to merge 5 commits into from

Conversation

lzlaa
Copy link
Collaborator

@lzlaa lzlaa commented Aug 29, 2024

No description provided.

Copy link
Member

@binacs binacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! LGTM overall just a few comments

@@ -168,6 +168,7 @@ func (o *Options) Flags() (nfs cliflag.NamedFlagSets) {
fs.Float32Var(&o.ComponentConfig.ClientConnection.QPS, "kube-api-qps", o.ComponentConfig.ClientConnection.QPS, "QPS to use while talking with kubernetes apiserver. This parameter overrides the value defined in config file, which is specified in --config.")
fs.Int32Var(&o.ComponentConfig.ClientConnection.Burst, "kube-api-burst", o.ComponentConfig.ClientConnection.Burst, "burst to use while talking with kubernetes apiserver. This parameter overrides the value defined in config file, which is specified in --config.")

fs.Int64Var(&o.ComponentConfig.ReservationTimeOutSeconds, "reservation-ttl", o.ComponentConfig.ReservationTimeOutSeconds, "how long resources will be reserved (for resource reservation).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this to #L185-L192 for more fine-grained configuration in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -4,5 +4,9 @@ name: godel-demo-default
nodes:
- role: control-plane
image: kindest/node:v1.21.1
- role: worker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need more workers for demo cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, reservation needs two worker nodes at least, and one single worker node may cause unexpected error.

newPrr := prr.DeepCopy()
// skip pod reservation request is not in reserved status.
if shouldSkipUpdateReservationStatus(prr) {
klog.V(4).InfoS("skip updating pod reservation request, pod reservation request is not in reserved status", "prr", podutil.GetReservationKey(prr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: skip -> Skip

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
newPrr.Status.Phase = schedulingv1a1.ReservationMatched
} else {
// TODO: add to count and set in unavaliable status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to go on in this situation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -35,12 +30,17 @@ import (
"k8s.io/client-go/tools/cache"
"k8s.io/klog/v2"

nodev1alpha1 "github.com/kubewharf/godel-scheduler-api/pkg/apis/node/v1alpha1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should packages from other repos (godel-scheduler-api, katalyst-api) be located in above?

@@ -84,4 +85,8 @@ type UnitFrameworkHandle interface {
FindStore(commonstore.StoreName) commonstore.Store

GetMaxWaitingDeletionDuration() time.Duration

// TODO: cleanup the following function methods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move theis line to #L87 considering that the GetMaxWaitingDeletionDuration should also be cleand up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

import v1 "k8s.io/api/core/v1"

const (
ResourceBytedanceSocket v1.ResourceName = "bytedance.com/socket"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check that whether bytedance.com may be claimed in open source projects

@lzlaa lzlaa force-pushed the dev/support-reservation branch 2 times, most recently from 890575a to 60bfc95 Compare September 10, 2024 11:31
@lzlaa lzlaa force-pushed the dev/support-reservation branch 2 times, most recently from af5a097 to da3cc83 Compare September 19, 2024 09:27
binacs
binacs previously approved these changes Sep 19, 2024
Copy link
Member

@binacs binacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lzlaa lzlaa changed the title WIP: support reservation support reservation Sep 20, 2024
Please make sure that the Godel Controller Manager component has been deployed in the kind cluster along with other scheduler components as control plane components.

```shell
$ k get po -n godel-system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl or alias k=kubectl

We should ensure that everyone can run these examples easily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, replace "k" to "kubectl".

godel.bytedance.com/reservation-ttl=7200
```

For example, if you configure the above annotation on a Pod or Deployment, the scheduler will automatically release the reserved resources after 24 hours if there is no consumer for the reserved resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if both of the Deployment and Pod configured this field?

Priority needs to be clearly described.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some notes about the priority

binacs
binacs previously approved these changes Oct 13, 2024
Copy link
Member

@binacs binacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

return
}

fs.StringVar(&o.BindAddress, "address", o.BindAddress, "DEPRECATED: the IP address on which to listen for the --port port (set to 0.0.0.0 for all IPv4 interfaces and :: for all IPv6 interfaces). See --bind-address instead.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to remove those deprecated flags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is best to keep the behavior consistent with other scheduler components, retain the insecure server configuration, and clean it up later.

return nil
}

func updateDeprecatedInsecureServingOptionsFromAddress(is *apiserveroptions.DeprecatedInsecureServingOptionsWithLoopback, addr string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unnecessary if deprecated flags are removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is best to keep the behavior consistent with other scheduler components, retain the insecure server configuration, and clean it up later.

As the opinion shows, we need keep these flags.

}

// DefaultLeaderMigrationOptions returns a LeaderMigrationOptions with default values.
func DefaultLeaderMigrationOptions() *LeaderMigrationOptions {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need LeaderMigration. According to design doc https://github.com/kubernetes/enhancements/tree/master/keps/sig-cloud-provider/2436-controller-manager-leader-migration,

Support a migration process for large scale and highly available Kubernetes clusters using the in-tree cloud providers (via kube-controller-manager and kubelet) to their out-of-tree equivalents (via cloud-controller-manager).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, removed it.

}

// TODO
func (opt *ReservationControllerOptions) Validate() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add some validations.

Copy link
Collaborator Author

@lzlaa lzlaa Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No validation requirement of ReservationControllerOptions, so just delete these method.

This document uses a kind cluster as an example to introduce how to enable and use the resource reservation capability of the Godel Scheduler.

## Prerequisites

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how to deploy godel controller manager, is there any documentation, or links maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the docs/features/kind-cluster-setup.md, users can deploy godel controller manager in kind cluster.

@@ -1,3 +1,19 @@
/*
Copyright 2023 The Godel Scheduler Authors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LICENSE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants