Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/onboarding bot #18

Conversation

Keyrxng
Copy link
Contributor

@Keyrxng Keyrxng commented Sep 18, 2024

Resolves #17
Requires #16

@Keyrxng Keyrxng marked this pull request as ready for review September 18, 2024 01:00
@Keyrxng Keyrxng mentioned this pull request Sep 21, 2024
@Keyrxng
Copy link
Contributor Author

Keyrxng commented Oct 1, 2024

Some QA: ubq-testing/ask-plugin#2

Keep in mind that my DB only has embeddings for readmes but this current approach is using #16 with sshivaditya2019's original DB function with the current_id param removed so it compares just the query embedding against all stored embeddings. A threshold of 0.6, direct prompts and I use only the top ranked embedding section into the existing ctx.

Once the DB starts to scale the signal to noise will drop and so I intend on implementing a couple of additional search functions.

  • type: essentially a classification of the prompt, setup_instructions etc. Requires one zeroShot GPT4 classification
  • metadata: we can index using JSON keys as mentioned by mentlegen/whilefoo I think, so with each embedding we have a metadata interface something like:
type Metadata = {
  repoNodeId: string;
  issueNodeId: string;
  authorAssociation: string;
}

Then we create a similar search fn which indexes based on those which allows us to easily restrict the scope we search for embedding context based on:

  • webhook payload details: repo, issue
  • categorization of text: setup_instructions | complete task summary | task spec | etc...

scenario:

Help me start this task? I'm stuck on this issue...

classify > [setup, summary, spec] > obtain payload meta > 3x embedding search, one in each category, use the best > gptDecideContext() > gptContext + prompt

I'm stuck on this PR, Review this PR for me...

classify > [summary, spec, sourceCode] (diff and source separate) > meta > 3x search > gptDecideContent() >

Like shiv said about context distillation, we used to have the gptDecideContext fn and we'd feed it the entire ctx and have it truncate it, I think that would be a good thing to bring back.

Idk if it's overkill but we do have the entire convo history so we could fetch embeddings based on conversational context fetched prior to truncating, if we aren't already Supabase premium I think we'll for sure have to shortly lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

onboarding bot
1 participant