Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💅 Update text2vec-azure-openai to utilize isAzure: true flag and mark resourceName + deploymentId as optional #196

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

flipace
Copy link
Contributor

@flipace flipace commented Sep 18, 2024

This relates to the changes in weaviate/weaviate#5776

With this adjustment, devs can use the text2VecAzureOpenAI vectorizer, without specifying deploymentId or resourceName upfront for their collection.

Instead, they may provide the headers X-Azure-Deployment-Id and X-Azure-Resource-Name in their requests to set these.

vectorizer.text2VecAzureOpenAI({
  vectorizeCollectionName: false,
})

Internally, using text2VecAzureOpenAI will set the an isAzure: true flag for the OpenAI vectorizer, so it understands that the Azure logic must be used.

…rk `resourceName` + `deploymentId` as optional

This relates to the changes in weaviate/weaviate#5776
@weaviate-git-bot
Copy link

Great to see you again! Thanks for the contribution.

beep boop - the Weaviate bot 👋🤖

PS:
Are you already a member of the Weaviate Slack channel?

Copy link
Contributor

@tsmith023 tsmith023 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Left a few comments mainly around house-keeping otherwise the PR looks great 😁

src/collections/configure/unit.test.ts Outdated Show resolved Hide resolved
Comment on lines +187 to +188
/** Will automatically be set to true. You don't need to set this manually. */
isAzure?: true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is always true, does it still need the ? operator? If not, can we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the true is always set by the text2VecAzureOpenAI function internally, should not be necessary to be passed by the user - but due to how the types in general are structured I could not easily 1) remove it completely from the config object, nor 2) remove the optional operator, since then the user would be required to supply isAzure: true manually 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, okay I see now. I think this makes sense if the user makes use of the .azureOpenAI method but part of the API is still to allow users to work with the raw types if they so wish. As such, if a user did:

generative: {
  name: 'generative-openai',
  config: {
    deploymentId: config.deploymentId,
    resourceName: config.resourceName,
     baseURL: config.baseURL,
  }
}

then the type system would allow it since isAzure is optional yet the runtime would interpret this as isAzure: undefined, which is a false-y value.

I like the idea of introducing the isAzure flag to the TS client but I think it may be better placed as a pure internal, e.g. not exposed in the user types, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to not expose it - but I could not figure out how to make it work so that it is not marked as unexpected internally where i place it 😬

This part here is what I mean:

CleanShot 2024-10-07 at 11 02 14@2x

I wouldn't know how to tell TS here that this is an expected prop without adding a // @ts-expect-error 🤔

Copy link
Contributor

@tsmith023 tsmith023 Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that isAzure should be removed from the ...Config types and instead interpreted by the client's runtime itself depending on the name of the module. So this would most likely require the addition of generative-azure-openai, alongside text2vec-azure-openai, that is then parsed appropriately in the collection creation logic

There we'd have some boolean clauses to determine whether the module is an azure one, based on the name, and then inject isAzure: true into the config appropriately. IMO, this would be the most consistent for the client/server relationship as I'm sure there will be future refactoring of the server that changes this behaviour. Then, we'd only break the internal relationship rather than the public API

We already do something similar here, wdyt about extending this logic as described above?

If you'd rather not then that's fine, I can add it to my backlog 😁 Also, sorry for the spaghetti of the collection.create method, I've not had the chance to refactor it into a better structure 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - this definitely makes more sense 👍 I didn't look into this part.

I'm drowning a bit in other work right now, but I should be able to look into this in more detail hopefully next week or so 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get round to it this week, I'll ping you on here to let you know. Thanks for your help so far!

src/collections/config/types/generative.ts Show resolved Hide resolved
…passed at all

Since the only indicator for the azureOpenAI config is now the isAzure: true flag, which is set in the vectorizer setup directly, no config object is necessary for it.
@flipace flipace requested a review from tsmith023 October 7, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants