Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci-matrix: add partition-tests-inpackage #277

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .project/golangci-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ issues:
exclude-rules:
- linters: [revive]
text: 'should have comment .*or be unexported'
- linters: [revive]
text: 'should have a package comment'
- linters: [stylecheck]
text: 'ST1000: at least one file in a package should have a package comment'
- linters: [errcheck]
Expand Down
115 changes: 79 additions & 36 deletions cmd/tool/matrix/matrix.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,10 @@ func Run(name string, args []string) error {
}

type options struct {
numPartitions uint
timingFilesPattern string
debug bool
numPartitions uint
timingFilesPattern string
partitionTestsInPackage string
debug bool

// shims for testing
stdin io.Reader
Expand All @@ -52,6 +53,8 @@ func setupFlags(name string) (*pflag.FlagSet, *options) {
"number of parallel partitions to create in the test matrix")
flags.StringVar(&opts.timingFilesPattern, "timing-files", "",
"glob pattern to match files that contain test2json events, ex: ./logs/*.log")
flags.StringVar(&opts.partitionTestsInPackage, "partition-tests-in-package", "",
"partition the tests in a single package instead of partitioning by package")
flags.BoolVar(&opts.debug, "debug", false,
"enable debug logging")
return flags, opts
Expand All @@ -71,6 +74,18 @@ The output of the command is a JSON object that can be used as the matrix
strategy for a test job.


When the --partition-tests-in-package flag is set to the name of a package, this
command will output a matrix that partitions the tests in that one package. In
this mode the command reads a list of test names from stdin.

Example

echo -n "::set-output name=matrix::"
go test --list github.com/example/pkg | \
%[1]s --partitions 5 \
--partition-tests-in-package github.com/example/pkg \
--timing-files ./*.log --max-age-days 10

Flags:
`, name)
flags.SetOutput(out)
Expand All @@ -89,7 +104,7 @@ func run(opts options) error {
return fmt.Errorf("--timing-files is required")
}

pkgs, err := readPackages(opts.stdin)
inputs, err := readPackagesOrFiles(opts.stdin)
if err != nil {
return fmt.Errorf("failed to read packages from stdin: %v", err)
}
Expand All @@ -100,16 +115,16 @@ func run(opts options) error {
}
defer closeFiles(files)

pkgTiming, err := packageTiming(files)
timing, err := aggregateByName(files, opts.partitionTestsInPackage)
if err != nil {
return err
}

buckets := bucketPackages(packagePercentile(pkgTiming), pkgs, opts.numPartitions)
return writeMatrix(opts.stdout, buckets)
buckets := createBuckets(percentile(timing), inputs, opts.numPartitions)
return writeMatrix(opts, buckets)
}

func readPackages(stdin io.Reader) ([]string, error) {
func readPackagesOrFiles(stdin io.Reader) ([]string, error) {
var packages []string
scan := bufio.NewScanner(stdin)
for scan.Scan() {
Expand Down Expand Up @@ -143,27 +158,39 @@ func parseEvent(reader io.Reader) (testjson.TestEvent, error) {
return event, err
}

func packageTiming(files []*os.File) (map[string][]time.Duration, error) {
func aggregateByName(files []*os.File, pkgName string) (map[string][]time.Duration, error) {
timing := make(map[string][]time.Duration)
for _, fh := range files {
exec, err := testjson.ScanTestOutput(testjson.ScanConfig{Stdout: fh})
if err != nil {
return nil, fmt.Errorf("failed to read events from %v: %v", fh.Name(), err)
}

if pkgName != "" {
pkg := exec.Package(pkgName)
if pkg == nil {
return nil, nil
}

for _, tc := range pkg.TestCases() {
timing[tc.Test.Name()] = append(timing[tc.Test.Name()], tc.Elapsed)
}
continue
}

for _, pkg := range exec.Packages() {
timing[pkg] = append(timing[pkg], exec.Package(pkg).Elapsed())
}
}
return timing, nil
}

func packagePercentile(timing map[string][]time.Duration) map[string]time.Duration {
func percentile(timing map[string][]time.Duration) map[string]time.Duration {
result := make(map[string]time.Duration)
for pkg, times := range timing {
for group, times := range timing {
lenTimes := len(times)
if lenTimes == 0 {
result[pkg] = 0
result[group] = 0
continue
}

Expand All @@ -173,10 +200,10 @@ func packagePercentile(timing map[string][]time.Duration) map[string]time.Durati

r := int(math.Ceil(0.85 * float64(lenTimes)))
if r == 0 {
result[pkg] = times[0]
result[group] = times[0]
continue
}
result[pkg] = times[r-1]
result[group] = times[r-1]
}
return result
}
Expand All @@ -187,18 +214,18 @@ func closeFiles(files []*os.File) {
}
}

func bucketPackages(timing map[string]time.Duration, packages []string, n uint) []bucket {
sort.SliceStable(packages, func(i, j int) bool {
return timing[packages[i]] >= timing[packages[j]]
func createBuckets(timing map[string]time.Duration, item []string, n uint) []bucket {
sort.SliceStable(item, func(i, j int) bool {
return timing[item[i]] >= timing[item[j]]
})

buckets := make([]bucket, n)
for _, pkg := range packages {
for _, name := range item {
i := minBucket(buckets)
buckets[i].Total += timing[pkg]
buckets[i].Packages = append(buckets[i].Packages, pkg)
buckets[i].Total += timing[name]
buckets[i].Items = append(buckets[i].Items, name)
log.Debugf("adding %v (%v) to bucket %v with total %v",
pkg, timing[pkg], i, buckets[i].Total)
name, timing[name], i, buckets[i].Total)
}
return buckets
}
Expand All @@ -211,16 +238,18 @@ func minBucket(buckets []bucket) int {
case min < 0 || b.Total < min:
min = b.Total
n = i
case b.Total == min && len(buckets[i].Packages) < len(buckets[n].Packages):
case b.Total == min && len(buckets[i].Items) < len(buckets[n].Items):
n = i
}
}
return n
}

type bucket struct {
Total time.Duration
Packages []string
Total time.Duration
// Items is the name of packages in the default mode, or the name of tests
// in partition-by-test mode.
Items []string
}

type matrix struct {
Expand All @@ -231,32 +260,46 @@ type Partition struct {
ID int `json:"id"`
EstimatedRuntime string `json:"estimatedRuntime"`
Packages string `json:"packages"`
Tests string `json:"tests,omitempty"`
Description string `json:"description"`
}

func writeMatrix(out io.Writer, buckets []bucket) error {
m := matrix{Include: make([]Partition, len(buckets))}
func writeMatrix(opts options, buckets []bucket) error {
m := matrix{Include: make([]Partition, 0, len(buckets))}
for i, bucket := range buckets {
if len(bucket.Items) == 0 {
continue
}

p := Partition{
ID: i,
EstimatedRuntime: bucket.Total.String(),
Packages: strings.Join(bucket.Packages, " "),
}
if len(bucket.Packages) > 0 {
var extra string
if len(bucket.Packages) > 1 {
extra = fmt.Sprintf(" and %d others", len(bucket.Packages)-1)
}
p.Description = fmt.Sprintf("partition %d - package %v%v",
p.ID, testjson.RelativePackagePath(bucket.Packages[0]), extra)

if opts.partitionTestsInPackage != "" {
p.Packages = opts.partitionTestsInPackage
p.Description = fmt.Sprintf("partition %d with %d tests", p.ID, len(bucket.Items))
p.Tests = fmt.Sprintf("-run='^%v$'", strings.Join(bucket.Items, "$,^"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be:

Suggested change
p.Tests = fmt.Sprintf("-run='^%v$'", strings.Join(bucket.Items, "$,^"))
p.Tests = fmt.Sprintf("-run ^%v$", strings.Join(bucket.Items, "|"))

See:

https://github.com/pulumi/pulumi/blob/86fbe80b8d99fe9346bad42f4f9c88b81a50a3db/scripts/get-job-matrix.py#L317-L327

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ya, the join needs to be a pipe not a comma for sure! Good catch.

I think you're right that we don't need the ^ and $ around each test name, but I'll need to test that out again. go test has some strange handling for regex (ex: golang/go#39904). We'll need to add ( and ) if we remove those.

Both = and should work for the separator. The = is nice because you can pass it as a single quoted arg instead of it being two separate arguments.

I believe the whole string does need to be quoted with single quotes so that the pipes and $ are not interpreted by the shell. In your case that may not be a problem because you're running it from python, but most of the time I expect this to be read from bash.

Copy link

@AaronFriel AaronFriel Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wish there were a simpler (& more efficient) way to provide a list of tests.

I don't think the whole string needs to be quoted - or you should let the user do the quoting. Easier for them to add quotes of the appropriate kind, and GitHub Actions allows plenty of ways to inject a variable or string into a script, e.g.: the following will ensure that any special characters are handled correctly:

  env:
    TESTS: ${{ inputs.tests }}
  run:
    echo $TESTS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary goal with this command is to make it easy to integrate into a github workflow. Having to set a value into an env var just to use it does not make it easy. I expect someone to be able to do something like this and not have to worry about escaping or formatting the values.

I see your use case is quite different. You have a lot of code already in place, and you're looking for a tool to perform the test bucketing.

I think for your use case we could add a --format flag to this command. The default would be --format=github-action-matrix would be a JSON output that you can use directly in a github actions matrix. For your use case we could do --format=json, which would output the package list and test list as an array (instead of a space separated string, or a -run flag). That should make it easier for you to consume the output, while still supporting my primary goal of making it easy to use in a github workflow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.


m.Include = append(m.Include, p)
continue
}

p.Packages = strings.Join(bucket.Items, " ")

var extra string
if len(bucket.Items) > 1 {
extra = fmt.Sprintf(" and %d others", len(bucket.Items)-1)
}
p.Description = fmt.Sprintf("partition %d - package %v%v",
p.ID, testjson.RelativePackagePath(bucket.Items[0]), extra)

m.Include[i] = p
m.Include = append(m.Include, p)
}

log.Debugf("%v\n", debugMatrix(m))

err := json.NewEncoder(out).Encode(m)
err := json.NewEncoder(opts.stdout).Encode(m)
if err != nil {
return fmt.Errorf("failed to json encode output: %v", err)
}
Expand Down
Loading