Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task]: Spotting commands in the stream from coding assistants like cline #844

Open
therealnb opened this issue Jan 30, 2025 · 6 comments · Fixed by #917
Open

[Task]: Spotting commands in the stream from coding assistants like cline #844

therealnb opened this issue Jan 30, 2025 · 6 comments · Fixed by #917
Assignees

Comments

@therealnb
Copy link
Contributor

therealnb commented Jan 30, 2025

Description

We have done some work to spot suspicious commands in #34. The task here is to write this code into codegate. This involves

Extensions for the future

  • Have more than two categories - e.g. safe, risky, and block
  • Block commands in the 'block' category
  • Have the block behaviour configurable
  • Have more options around context - e.g. files and dirs that are writable
  • Have the NN learn from feedback from the user (i.e. retrain the NN from feedback in the codegate UI)

We will probably have to intercept the commands at

snippets = extract_snippets(current_content)

and write the comment back at

async def _snippet_comment(self, snippet: CodeSnippet, context: PipelineContext) -> str:

As a baseline we decided to use the hybrid-all-MiniLM-L6-v2 with post-processing by a small ANN. We didn't want the extra cost of codebert, but the local ANN seems to produce some benefit.

Additional Context

We need to decide which model to use for the embeddings. all-minilm-L6-v2 works well, especially with a post ANN process step. It is already in codegate, so we get it for free. microsoft/codebert-base works better as expected, but at a cost of 476 MB.
The ANNs are much smaller
ls -lh | grep hybrid
-rw-r--r-- 1 nigel staff 228K 29 Jan 18:21 hybrid-all-MiniLM-L6-v2.model
-rw-r--r-- 1 nigel staff 420K 29 Jan 18:21 hybrid-microsoft-codebert-base.model

@therealnb
Copy link
Contributor Author

Initial implementation inhttps://github.com//pull/917

Note that:

  • There has been little optimisation of the ANN
  • We need to work on getting better data
  • Performance may not be good enough - testing required

Accuracy: 0.88
Precision: 0.8823529411764706
Recall: 0.7894736842105263
F1 Score: 0.8333333333333333

@therealnb
Copy link
Contributor Author

This was reverted in #930.

It was causing the runner to run out of space. See the slack discussion

@therealnb
Copy link
Contributor Author

Another PR created here to fix the build space problem #931

@therealnb
Copy link
Contributor Author

This should be closed by #931

@therealnb
Copy link
Contributor Author

Reopened this (again) and disabled the suspicious commands (in #1204). We need more context restriction on where this is run.

@therealnb
Copy link
Contributor Author

On discussion we need this merged first https://github.com/jhrozek/codegate-open/blob/51dfd5e50f50e2a9b5deb61afcc52297872520bc/src/codegate/pipeline/functions/output.py#L53 (and possibly gather some of the tool semantics)

We will reconvene when that is done. Added @jhrozek as an assignee to flag when this is ready.

We also need to take a look at the top N MCP servers, to see what tool parameters they support.

We need to support 'experimental' flags - this is not specific to this case, but this would allow curious folk to switch features on and off.

This does not block the accuracy work, which can proceed in parallel.

CC @lukehinds , @poppysec , @blkt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants