Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue Dedupe #6

Closed
0x4007 opened this issue Sep 8, 2024 · 23 comments · Fixed by #11
Closed

Issue Dedupe #6

0x4007 opened this issue Sep 8, 2024 · 23 comments · Fixed by #11

Comments

@0x4007
Copy link
Member

0x4007 commented Sep 8, 2024

On issue.created and issue.edited we should check if there are any open issues that are x% similar within the repository. There should be two configurable percentages:

  1. A warning threshold, perhaps 75% similar as a default
  2. "Match" threshold, 95% similar default.
  • If a "match" is found, the bot should "close as unplanned"
  • If a "warning" level issue has been found, post a comment with all the similar open issues. Target and edit this comment if it currently exists.

It isn't entirely clear to me if this should be a separate plug-in, but this does require coupling to the same vector embeddings database. Another idea is to pass in the authentication details in each vector embeddings related plugin's config.

@0x4007
Copy link
Member Author

0x4007 commented Sep 8, 2024

@sshivaditya2019 rfc

@sshivaditya2019
Copy link
Collaborator

This could be managed with a separate plugin if necessary. If issue_content_embeddings is enabled, we can use similarity search; otherwise, we can use other libraries like spaCy. The decision on how to design this depends on whether we are comfortable with increasing coupling between the two plugins or components.

@sshivaditya2019
Copy link
Collaborator

/start

Copy link
Contributor

ubiquity-os bot commented Sep 8, 2024

DeadlineSun, Sep 15, 5:54 AM UTC
Registered Wallet 0xDAba6e01D15Db560b88C8F426b016801f79e1F69
Tips:
  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.

@0x4007
Copy link
Member Author

0x4007 commented Sep 8, 2024

Reducing coupling is preferred just as long as it doesn't make the setup overly complicated.

@sshivaditya2019
Copy link
Collaborator

Is it possible to retrieve the currently active plugins from within a plugin?

@0x4007
Copy link
Member Author

0x4007 commented Sep 8, 2024

Is it possible to retrieve the currently active plugins from within a plugin?

@gentlementlegen @Keyrxng RFC

I'm assuming by parsing the current config. I'm pretty sure we have a method in our SDK for this @whilefoo do you know?

@Keyrxng

This comment was marked as spam.

@0x4007
Copy link
Member Author

0x4007 commented Sep 8, 2024

You should check the original conversation and pull instead of speculating on how it's implemented.

As I understand, each vector embedding has an ID (issue id, or comment id)

GraphQL query of contributors "closed as complete" issues' IDs

We just check those IDs' embeddings.

@Keyrxng
Copy link
Contributor

Keyrxng commented Sep 8, 2024

You should check the original conversation
similar within the repository.

My mistake I misread this as within the org and I didn't think there would be any reliance on graphql or rest. I thought we'd cover all issues not just closed as complete and the embeddings would have relevant metadata to help filtering. When working with embeddings previously such as chunking pdfs etc they'd have relevant metadata for each chunk and I was considering each of ours as one chunk.

I guess centralizing the embedding db was also sort of mentioned too so yeah my bad I'll hide the comment and go get another coffee 😂

@whilefoo
Copy link
Contributor

whilefoo commented Sep 8, 2024

Is it possible to retrieve the currently active plugins from within a plugin?

@gentlementlegen @Keyrxng RFC

I'm assuming by parsing the current config. I'm pretty sure we have a method in our SDK for this @whilefoo do you know?

No we don't have that method.

Since both plugins would share the same database, maybe it would be better to keep it as one plugin?

@sshivaditya2019
Copy link
Collaborator

@0x4007
I think it would be better to make this as an extension of issue-comment-embeddings, like something that can be enabled if required.

To get the facts straight:

  • This would be a part of issue-comment-embeddings plugin.
  • If a Match threshold is found, it should close the issue automatically with close as unplanned.
  • If a Warning threshold is found, it would display the similar issues found using the vector.

Please let me know if there are any errors or if further adjustments are needed.

@0x4007
Copy link
Member Author

0x4007 commented Sep 9, 2024

All correct

@Keyrxng
Copy link
Contributor

Keyrxng commented Sep 9, 2024

Should this plugin just be rebranded to embeddings-plugin and we can just keep all embedding-related features nice and compact? as all related features require the same vector db or will require multiple vector DBs depending on how other embedding types are stored.

A rebranding makes sense because issue_comment_embeddings suggests it's only creating them whereas embeddings-plugin sounds like it has many usecases.

Features such as issueDepude and others could be enabled/disabled per the plugin settings according to what a partner wants from their embeddings.

@0x4007
Copy link
Member Author

0x4007 commented Sep 9, 2024

It's done

@sshivaditya2019
Copy link
Collaborator

I think this blocked by #8, once we are able to nail down a schema, this should be good to go.

@0x4007
Copy link
Member Author

0x4007 commented Sep 11, 2024

I decided on a schema there

Copy link
Contributor

ubiquity-os bot commented Sep 11, 2024

@sshivaditya2019, this task has been idle for a while. Please provide an update.

@sshivaditya2019
Copy link
Collaborator

@sshivaditya2019, this task has been idle for a while. Please provide an update.

Combining in PR #9

@sshivaditya2019
Copy link
Collaborator

sshivaditya2019 commented Sep 12, 2024

@0x4007 Does the ubiquibot-kernel, support issue.created, issue.edited and issue.deleted, event emitter types ?. I am getting TransformDecodeCheckError: Unable to decode value as it does not match the expected schema for that.

I think this is a issue, with ubiquibot-kernel, as requests are not being passed to the worker.

@0x4007
Copy link
Member Author

0x4007 commented Sep 12, 2024

@sshivaditya2019, this task has been idle for a while. Please provide an update.

Combining in PR #9

Pulls need to be separated.

@0x4007 Does the ubiquibot-kernel, support issue.created, issue.edited and issue.deleted, event emitter types ?. I am getting TransformDecodeCheckError: Unable to decode value as it does not match the expected schema for that.

I think this is a issue, with ubiquibot-kernel, as requests are not being passed to the worker.

Try issues plural instead. Check the type definitions.

Copy link
Contributor

ubiquity-os bot commented Sep 16, 2024

[ 609.534 WXDAI ]

@sshivaditya2019
Contributions Overview
View Contribution Count Reward
Issue Task 1 600
Issue Comment 6 9.534
Review Comment 12 0
Conversation Incentives
Comment Formatting Relevance Reward
This could be managed with a separate plugin if necessary. If &#…
2.79
content:
  p:
    symbols:
      \b\w+\b:
        count: 48
        multiplier: 0.1
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 1
multiplier: 1
0.8 2.232
Is it possible to retrieve the currently active plugins from wit…
0.88
content:
  p:
    symbols:
      \b\w+\b:
        count: 13
        multiplier: 0.1
    score: 1
multiplier: 1
0.3 0.264
@0x4007 I think it would be better to make this as an extensio…
5.09
content:
  p:
    symbols:
      \b\w+\b:
        count: 76
        multiplier: 0.1
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 11
        multiplier: 0.1
    score: 1
  ul:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 1
  li:
    symbols:
      \b\w+\b:
        count: 3
        multiplier: 0.1
    score: 1
multiplier: 1
0.9 4.581
I think this blocked by #8, once we are able to nail down a sche…
1.33
content:
  p:
    symbols:
      \b\w+\b:
        count: 21
        multiplier: 0.1
    score: 1
multiplier: 1
0.6 0.798
Combining in PR #9
0.32
content:
  p:
    symbols:
      \b\w+\b:
        count: 4
        multiplier: 0.1
    score: 1
multiplier: 1
0.2 0.064
@0x4007 Does the `ubiquibot-kernel`, support `issue.…
3.19
content:
  p:
    symbols:
      \b\w+\b:
        count: 29
        multiplier: 0.1
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 23
        multiplier: 0.1
    score: 1
multiplier: 1
0.5 1.595
Resolves #6
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0
    score: 1
multiplier: 0
0.1 -
@0x4007 This function handles only the deduplication process, if…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 43
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
A cosine similarity of 0.75 appears quite close for identifying …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 58
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Removed.
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Removed Labels. Labels will not be added on issue close
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 10
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
So, for both `Match` and `Warning` threshold, th…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 16
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Added, Now it fetches the values from the `context.config …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 8
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
@0x4007 I have tried to make a few examples, let me know if hav…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 31
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Added They will display the cosine similarity in percentage afte…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 30
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
95%: - [First Comment](https://github.com/sshivaditya2019/test-…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 22
        multiplier: 0.2
    score: 1
  ul:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
  li:
    symbols:
      \b\w+\b:
        count: 4
        multiplier: 0.2
    score: 1
  a:
    symbols:
      \b\w+\b:
        count: 11
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Fixed that, it now returns the similar issue in both `MATCH&…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 23
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
  ul:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
  li:
    symbols:
      \b\w+\b:
        count: 4
        multiplier: 0.2
    score: 1
  a:
    symbols:
      \b\w+\b:
        count: 8
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
That's the first issue of that type, so its expected to not have…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 63
        multiplier: 0.2
    score: 1
multiplier: 0
1 -

[ 41.438 WXDAI ]

@0x4007
Contributions Overview
View Contribution Count Reward
Issue Specification 1 24.5
Issue Comment 8 5.388
Review Comment 12 11.55
Conversation Incentives
Comment Formatting Relevance Reward
On `issue.created` and `issue.edited` we should …
24.5
content:
  p:
    symbols:
      \b\w+\b:
        count: 24
        multiplier: 0.1
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 5
        multiplier: 0.1
    score: 5
  ol:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 0
  li:
    symbols:
      \b\w+\b:
        count: 93
        multiplier: 0.1
    score: 1
  ul:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 0
multiplier: 3
1 24.5
@sshivaditya2019 rfc
0.36
content:
  p:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
multiplier: 1
0.1 0.036
Reducing coupling is preferred just as long as it doesn't make t…
2.11
content:
  p:
    symbols:
      \b\w+\b:
        count: 16
        multiplier: 0.2
    score: 1
multiplier: 1
0.2 0.422
@gentlementlegen @Keyrxng RFC I'm assuming by parsing the curren…
3.4
content:
  p:
    symbols:
      \b\w+\b:
        count: 28
        multiplier: 0.2
    score: 1
multiplier: 1
0.3 1.02
You should check the original conversation and pull instead of s…
5.08
content:
  p:
    symbols:
      \b\w+\b:
        count: 45
        multiplier: 0.2
    score: 1
multiplier: 1
0.7 3.556
All correct
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
multiplier: 1
- -
It's done
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 3
        multiplier: 0.2
    score: 1
multiplier: 1
- -
I decided on a schema there
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 6
        multiplier: 0.2
    score: 1
multiplier: 1
- -
Pulls need to be separated. Try issues plural instead. Check th…
1.77
content:
  p:
    symbols:
      \b\w+\b:
        count: 13
        multiplier: 0.2
    score: 1
multiplier: 1
0.2 0.354
- Adding labels is out of scope. Don't do that. Close it as unpl…
2.45
content:
  ul:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 1
  li:
    symbols:
      \b\w+\b:
        count: 41
        multiplier: 0.1
    score: 1
multiplier: 1
1 2.45
Cool just needs configuration and I can merge.
0.59
content:
  p:
    symbols:
      \b\w+\b:
        count: 8
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.59
I'm assuming it all works. Code looks good.
0.65
content:
  p:
    symbols:
      \b\w+\b:
        count: 9
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.65
Needs to comment the similar issue link
0.52
content:
  p:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.52
It needs to always let the user know which it thinks is similar.…
1.65
content:
  p:
    symbols:
      \b\w+\b:
        count: 27
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.65
Why did you do 50%?
0.39
content:
  p:
    symbols:
      \b\w+\b:
        count: 5
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.39
No labels
0.18
content:
  p:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.18
What is going on here? Reopening issues is out of scope.
0.77
content:
  p:
    symbols:
      \b\w+\b:
        count: 11
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.77
These should be configurable values. Can you see how configurati…
1.06
content:
  p:
    symbols:
      \b\w+\b:
        count: 16
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.06
Can you link your issue where you tested so we can see the resul…
0.94
content:
  p:
    symbols:
      \b\w+\b:
        count: 14
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.94
Okay it seems like you aren't following the spec again. Needs t…
1.65
content:
  p:
    symbols:
      \b\w+\b:
        count: 27
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.65
Doesn't look like it in the [first one ](https://github.com/sshi…
0.7
content:
  p:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.1
    score: 1
  a:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.7

[ 16.765 WXDAI ]

@Keyrxng
Contributions Overview
View Contribution Count Reward
Issue Comment 3 16.765
Conversation Incentives
Comment Formatting Relevance Reward
Yeah by parsing the private config file but I'm not sure why you…
15.61
content:
  p:
    symbols:
      \b\w+\b:
        count: 355
        multiplier: 0.1
    score: 1
  hr:
    symbols:
      \b\w+\b:
        count: 3
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 4
        multiplier: 0.1
    score: 1
  ul:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.1
    score: 1
  li:
    symbols:
      \b\w+\b:
        count: 5
        multiplier: 0.1
    score: 1
multiplier: 1
0.8 12.488
My mistake I misread this as within the org and I didn't think t…
4.97
content:
  p:
    symbols:
      \b\w+\b:
        count: 99
        multiplier: 0.1
    score: 1
multiplier: 1
0.2 0.994
Should this plugin just be rebranded to `embeddings-plugin&#…
4.69
content:
  p:
    symbols:
      \b\w+\b:
        count: 82
        multiplier: 0.1
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 6
        multiplier: 0.1
    score: 1
multiplier: 1
0.7 3.283

[ 0.32 WXDAI ]

@whilefoo
Contributions Overview
View Contribution Count Reward
Issue Comment 1 0.32
Conversation Incentives
Comment Formatting Relevance Reward
No we don't have that method. Since both plugins would share th…
0.4
content:
  p:
    symbols:
      \b\w+\b:
        count: 26
        multiplier: 0.1
    score: 1
multiplier: 0.25
0.8 0.32

@0x4007
Copy link
Member Author

0x4007 commented Sep 16, 2024

@ubiquibot/software-development can somebody install this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants