You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, mufeedvh. Thank you for a very nice application.
I'm running into a problem right now when dealing with large code repositories. When I'm dealing with small code repositories, code2prompt works great. But when I'm dealing with large repositories, I have a token overflow problem when interacting with LLM.
So how should we deal with large code repositories? Sending only part of the source code will affect the context. Now it seems that only Gemini 1.5pro can handle about 200m tokens, which is the upper limit.
Can you perform tuning on a large code repository? Or do you have any good suggestions?
The text was updated successfully, but these errors were encountered:
Hi @LZING
I don't really have a solution for your problem. But I have couple of observations from my experience
Including large code repositories would mean the resultant prompt is very large, which is not supported by many LLMs. Even if they do support, the quality of output may not always be relevant to your expectations.
Full context of the code may not be needed for all the usecases, Ex: find-security-vulnerabilities, github hub commits, git hub pull requests related templates need only a part of your code.
You can you use --exclude, --include kind of arguments to reduce the size of code that you send to code2prompt. If your requirements are more complex than that, you can write a pre-processing script that fetches the required files/folders into temp folder which in turn can be passed to the code2prompt.
Hi, mufeedvh. Thank you for a very nice application.
I'm running into a problem right now when dealing with large code repositories. When I'm dealing with small code repositories, code2prompt works great. But when I'm dealing with large repositories, I have a token overflow problem when interacting with LLM.
So how should we deal with large code repositories? Sending only part of the source code will affect the context. Now it seems that only Gemini 1.5pro can handle about 200m tokens, which is the upper limit.
Can you perform tuning on a large code repository? Or do you have any good suggestions?
The text was updated successfully, but these errors were encountered: