Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans to introduce code indexing? #16

Open
RastislavKish opened this issue Apr 25, 2024 · 3 comments
Open

Any plans to introduce code indexing? #16

RastislavKish opened this issue Apr 25, 2024 · 3 comments

Comments

@RastislavKish
Copy link

Hello,

first of all, a cool project!

Larger codebases often significantly exceed even the largest context windows available these days, while offline LLMs are even more troublesome in this regard.

It could be useful to implement an indexing feature, that would not generate a single prompt from the codebase, but instead output multiple smaller prompts containing max. N tokens, with the purpose of creating some kind of code abstraction. This abstract could be afterwards used together with just a single file of code where modifications should be made.

I don't use LLMs for coding very frequently, but this seems like the only plausible approach for fitting large codebases into LLLMs. Have you made any considerations/experiments with this approach and possible implementation into code2prompt?

@mufeedvh
Copy link
Owner

Hi @RastislavKish, this is an interesting feature request. Dividing prompts into multiple chunks would lose important context when working with the entire codebase and the context-window applies to the entire conversation with an LLM just that it acts as a sliding window where it loses context as we consume more tokens.

Could you please describe how you'd be using such a feature? I'll see if I could think about a feature that could tailor to your needs.

@swiftugandan
Copy link

Aider gives some clues on how you could compress the context by just looking at the symbols in the code. https://aider.chat/2023/10/22/repomap.html ... Perhaps you could consider something similar.

@dbenn8
Copy link

dbenn8 commented Jul 19, 2024

I once built a really simple python script that traversed the code files in a project and pulled out each function (and maybe symbol?) and created a single markdown file with the parameters, return type, and comments all grouped by file within it.

It was probably a hacky solution to this problem (context Windows were smaller then), but it does help the LLM get broad overall context if you also feed it the full details of sections of the code more relevant to the specific problem you want help with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants