-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any plans to introduce code indexing? #16
Comments
Hi @RastislavKish, this is an interesting feature request. Dividing prompts into multiple chunks would lose important context when working with the entire codebase and the context-window applies to the entire conversation with an LLM just that it acts as a sliding window where it loses context as we consume more tokens. Could you please describe how you'd be using such a feature? I'll see if I could think about a feature that could tailor to your needs. |
Aider gives some clues on how you could compress the context by just looking at the symbols in the code. https://aider.chat/2023/10/22/repomap.html ... Perhaps you could consider something similar. |
I once built a really simple python script that traversed the code files in a project and pulled out each function (and maybe symbol?) and created a single markdown file with the parameters, return type, and comments all grouped by file within it. It was probably a hacky solution to this problem (context Windows were smaller then), but it does help the LLM get broad overall context if you also feed it the full details of sections of the code more relevant to the specific problem you want help with. |
Hello,
first of all, a cool project!
Larger codebases often significantly exceed even the largest context windows available these days, while offline LLMs are even more troublesome in this regard.
It could be useful to implement an indexing feature, that would not generate a single prompt from the codebase, but instead output multiple smaller prompts containing max. N tokens, with the purpose of creating some kind of code abstraction. This abstract could be afterwards used together with just a single file of code where modifications should be made.
I don't use LLMs for coding very frequently, but this seems like the only plausible approach for fitting large codebases into LLLMs. Have you made any considerations/experiments with this approach and possible implementation into code2prompt?
The text was updated successfully, but these errors were encountered: