Any plans to introduce code indexing? #16

RastislavKish · 2024-04-25T00:25:52Z

Hello,

first of all, a cool project!

Larger codebases often significantly exceed even the largest context windows available these days, while offline LLMs are even more troublesome in this regard.

It could be useful to implement an indexing feature, that would not generate a single prompt from the codebase, but instead output multiple smaller prompts containing max. N tokens, with the purpose of creating some kind of code abstraction. This abstract could be afterwards used together with just a single file of code where modifications should be made.

I don't use LLMs for coding very frequently, but this seems like the only plausible approach for fitting large codebases into LLLMs. Have you made any considerations/experiments with this approach and possible implementation into code2prompt?

mufeedvh · 2024-05-30T03:13:27Z

Hi @RastislavKish, this is an interesting feature request. Dividing prompts into multiple chunks would lose important context when working with the entire codebase and the context-window applies to the entire conversation with an LLM just that it acts as a sliding window where it loses context as we consume more tokens.

Could you please describe how you'd be using such a feature? I'll see if I could think about a feature that could tailor to your needs.

swiftugandan · 2024-06-11T10:47:48Z

Aider gives some clues on how you could compress the context by just looking at the symbols in the code. https://aider.chat/2023/10/22/repomap.html ... Perhaps you could consider something similar.

dbenn8 · 2024-07-19T01:13:22Z

I once built a really simple python script that traversed the code files in a project and pulled out each function (and maybe symbol?) and created a single markdown file with the parameters, return type, and comments all grouped by file within it.

It was probably a hacky solution to this problem (context Windows were smaller then), but it does help the LLM get broad overall context if you also feed it the full details of sections of the code more relevant to the specific problem you want help with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plans to introduce code indexing? #16

Any plans to introduce code indexing? #16

RastislavKish commented Apr 25, 2024

mufeedvh commented May 30, 2024

swiftugandan commented Jun 11, 2024

dbenn8 commented Jul 19, 2024 •

edited

Loading

Any plans to introduce code indexing? #16

Any plans to introduce code indexing? #16

Comments

RastislavKish commented Apr 25, 2024

mufeedvh commented May 30, 2024

swiftugandan commented Jun 11, 2024

dbenn8 commented Jul 19, 2024 • edited Loading

dbenn8 commented Jul 19, 2024 •

edited

Loading