Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data plane SDK definition #11

Merged
merged 7 commits into from
Mar 5, 2024
Merged

Update data plane SDK definition #11

merged 7 commits into from
Mar 5, 2024

Conversation

haruska
Copy link
Contributor

@haruska haruska commented Feb 28, 2024

The Pinecone Golang SDK has not been actively developed in over two years. In that time, much has changed in PineconeDB. This PR:

  • Updates the data plane protobuf definition to the most current and regenerates the client code.
  • Makes the generated code internal to the module, reducing the SDK surface area
  • Adds the new "pinecone" package that contains the improved SDK definition and implementation
  • Adds wrappers around data plane gRPC calls
  • Adds test harness and cursory tests for the data plane interface calls
  • Adds CI github actions to exec tests on PRs

There is follow-on work that still needs to happen including better testing (see GH Issue #12) and documentation.

@haruska haruska force-pushed the update_data_plane branch 2 times, most recently from 300e63d to 843ed3d Compare February 29, 2024 23:07
@haruska haruska marked this pull request as ready for review March 1, 2024 00:01
@haruska haruska force-pushed the update_data_plane branch from 498f1bf to 948e9b7 Compare March 1, 2024 16:05
Copy link
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here getting this SDK back in business, overall the approach makes sense and feels clean. Caveat to my review being that I'm not that familiar with Go.

Left a lot of questions and nits but I don't necessarily consider any of that blocking. Again, nice job, looking forward to seeing this come together.

}
}

func (idx *IndexConnection) akCtx(ctx context.Context) context.Context {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What's the akCtx stand for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"api key context" ... as in a new context with the api key added to the metadata.

Other thing of note, anything starting with a lowercase is not exported. So this is a private method on IndexConnection.

Comment on lines +256 to +261
namespaceSummaries := make(map[string]*NamespaceSummary)
for key, value := range res.Namespaces {
namespaceSummaries[key] = &NamespaceSummary{
VectorCount: value.VectorCount,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: We do some mapping before returning responses for FetchVectors, ListVectors, DescribeIndexStats, and query, and I suppose I'm mainly curious why that is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mainly to not expose the internal struct types from the generated code. This allows us to have a clean and well documented interface in the pinecone package without the user of the SDK having to dig into internal code.

Comment on lines +63 to +65
if err != nil {
return 0, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm not sure what the errors that are returned from the gRPC calls look like, but we may need to eventually think about centralized mapping / wrapping of things if that makes sense in Go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Go, every error is an error. Unless we wanted different messages, these are generally easier to maintain as pass-through.

return err
}

type UpsertVectorsRequest struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Some structs are defined in-line in this file and some are in models.go, would it make sense to keep everything in one place or is it more reasonable to define these things near where they're used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sudo-arbitrary. The exported structs defined here are the Request and Response structs for exported actions. The data structs are in models.go. I guess because those have a higher probability of being reused outside of index connection?

This is something that can be re-organized in the future without breaking uses of the SDK. Everything is exported to the pinecone package regardless of filename.

import (
"context"
"github.com/google/uuid"
"github.com/stretchr/testify/assert"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read up a bit on the testing framework in case anyone else is curious: https://github.com/stretchr/testify.

This was the one you mentioned was controversial? Seems fine to me, although I'm not sure if that's something we need to consider at this point.

Also less relevant now, but should we consider having a specific folder path for test files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, more that using anything not built in is controversial. This package allows for things like setup and tear-down on suite, sub-suite, test levels. It's just more like other testing frameworks folks are used to. Seemed like a good addition.

Copy link
Contributor

@jhamon jhamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good start. Obviously more tests will need to be added over time.

func (ts *IndexConnectionTests) TestFetchVectors() {
req := &FetchVectorsRequest{
Ids: ts.vectorIds,
Namespace: ts.namespace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My golang is rusty, but is there a way to make Namespace optional in these various request objects? On the one hand I like pushing people to be explicit, but on the other hand historically our other SDKs have treated namespace as optional and automatically applied the default "" when nothing is passed.

In Typescript we tried pulling the namespace up one level so it's more of a config property baked into the thing analgous to your IndexConnection. The advantage of that is the user only has to specify it once and doesn't have to remember to pass the correct namespace on every operation. The thinking was that seeing this param repeated across every method suggests it's really part of the connection config, and having to repeat it on every call feels unnecessarily repetitious and error prone. Food for thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as-is these are optional. Golang sets any omitted value to the "zero value" of that type. In string type's case, that's "".

I do like the idea of capturing an optional Namespace in the IndexConnection struct and removing it from the individual calls. There aren't cases where folks would be changing the namespace per call, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There aren't cases where folks would be changing the namespace per call, right?

It's possible they would want to work with multiple namespaces, but based on what I've seen in our docs and examples it's not common. If somebody wanted to work with multiple namespaces they could create two client instances and it wouldn't be a huge deal, certainly less error prone than having to remember to pass a namespace in every method call. Nobody has complained about this choice in the typescript client AFAIK. As an organizational concept, it's kind of a hangover from pre-serverless days when people needed access to a lighter weight form of multitenancy to save the cost of spinning up a completely separate (expensive) pod index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a84740c

This cleaned up IndexConnection quite a bit! Note, IndexConnection.Namespace is exported and can be changed by the user. The new namespace would then be used in subsequent calls.

@haruska haruska merged commit 128d9b8 into main Mar 5, 2024
3 checks passed
@haruska haruska deleted the update_data_plane branch March 5, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants