Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting dataset doesn't delete tables (and creating a table that exists hangs forever) #341

Open
coxley opened this issue Jul 17, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@coxley
Copy link

coxley commented Jul 17, 2024

What happened?

It's common in our tests to bootstrap a dataset, let library code create as many tables as needed, then cleanup the dataset after the test finishes. Removing a dataset should remove all tables and other resources inside it.

The emulator doesn't appear to do this. Additionally, the errors returned from the emulator are 500 which the Google SDKs (at least for Go) retry indefinitely on. This is likely the cause of the deadlock: source / source

What did you expect to happen?

Two things:

  • When errors happen, return them as non-internalError. The Google SDKs retry on these indefinitely because they're treated as infrastructural errors on their side.
  • When a dataset is deleted, clean up its tables. It's much easier from a test to delete a single dataset, than to record all tables a library may have created during a test.

How can we reproduce it (as minimally and precisely as possible)?

cd $(mktemp -d)
go mod init test
# Copy below test file to main_test.go
go mod tidy
go test .
package main

import (
	"context"
	"sync"
	"testing"

	"cloud.google.com/go/bigquery"
	"github.com/stretchr/testify/require"
	"github.com/testcontainers/testcontainers-go"
	"github.com/testcontainers/testcontainers-go/wait"
	"google.golang.org/api/option"
)

const (
	bqPort      = "9050/tcp"
	testProject = "project"
)

// startBigQuery spins up a test container and blocks until it's ready. Only one
// container can be started per test binary.
//
// Sharing a container across tests improves suite duration significantly
var startBigQuery = sync.OnceValues(func() (testcontainers.Container, error) {
	ctx := context.Background()
	return testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
		ContainerRequest: testcontainers.ContainerRequest{
			Image: "ghcr.io/goccy/bigquery-emulator:latest",
			Cmd: []string{
				"--project=" + testProject,
			},
			ExposedPorts: []string{bqPort},
			WaitingFor:   wait.ForExposedPort(),
		},
		Started: true,
	})
})

func TestBasic(t *testing.T) {
	t.Run("one", func(t *testing.T) {
		testHelper(t)
	})

	t.Run("two", func(t *testing.T) {
		testHelper(t)
	})
}

func testHelper(t *testing.T) {
	ctx := context.Background()

	// Startup emulator and create client
	container, err := startBigQuery()
	require.NoError(t, err)

	addr, err := container.Endpoint(ctx, "")
	require.NoError(t, err)

	endpoint := "http://" + addr
	client, err := bigquery.NewClient(
		ctx,
		testProject,
		option.WithoutAuthentication(),
		option.WithEndpoint(endpoint),
	)
	require.NoError(t, err)

	// Create dataset and table within
	ds := client.Dataset("main")
	err = ds.Create(ctx, nil)
	require.NoError(t, err)

	// Cleanup entire dataset on test finish
	t.Cleanup(func() {
		require.NoError(t, ds.Delete(ctx))
	})

	t.Log("creating table inside dataset")
	table := ds.Table("data")
	err = table.Create(ctx, &bigquery.TableMetadata{
		Schema: bigquery.Schema{
			&bigquery.FieldSchema{Name: "value", Type: bigquery.NumericFieldType},
		},
	})
	require.NoError(t, err)

	// This prevents the test from deadlocking indefinitely, but removing the entire
	// dataset should be enough.
	//
	// Either that, or returning an appropriate error.
	//
	// t.Cleanup(func() {
	// 	require.NoError(t, table.Delete(ctx))
	// })
	t.Log("table created")
}

func ignoreErr[T any](v T, err error) T {
	if err != nil {
		panic(err)
	}
	return v
}

Anything else we need to know?

No response

@coxley coxley added the bug Something isn't working label Jul 17, 2024
@ohaibbq
Copy link
Contributor

ohaibbq commented Jul 17, 2024

Does it work if you use dataset.DeleteWithContents()? Tables are not cleared from datasets during deletion by default.

I agree that the internalError response status causes lots of headaches with client library retries.

@coxley
Copy link
Author

coxley commented Jul 18, 2024

@ohaibbq: Oh dang, I completely missed that in the documentation. Thank you very much.

I guess my real issue is with internalError then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants