Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using code2prompt open large git repository and generate prompts for LLMs #27

Open
LZING opened this issue Jun 25, 2024 · 1 comment
Open

Comments

@LZING
Copy link

LZING commented Jun 25, 2024

Hi, mufeedvh. Thank you for a very nice application.

I'm running into a problem right now when dealing with large code repositories. When I'm dealing with small code repositories, code2prompt works great. But when I'm dealing with large repositories, I have a token overflow problem when interacting with LLM.

So how should we deal with large code repositories? Sending only part of the source code will affect the context. Now it seems that only Gemini 1.5pro can handle about 200m tokens, which is the upper limit.

Can you perform tuning on a large code repository? Or do you have any good suggestions?

@bhanub2406
Copy link

Hi @LZING
I don't really have a solution for your problem. But I have couple of observations from my experience

  1. Including large code repositories would mean the resultant prompt is very large, which is not supported by many LLMs. Even if they do support, the quality of output may not always be relevant to your expectations.
  2. Full context of the code may not be needed for all the usecases, Ex: find-security-vulnerabilities, github hub commits, git hub pull requests related templates need only a part of your code.
  3. You can you use --exclude, --include kind of arguments to reduce the size of code that you send to code2prompt. If your requirements are more complex than that, you can write a pre-processing script that fetches the required files/folders into temp folder which in turn can be passed to the code2prompt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants