-
-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file truncation #149
Comments
Hi @klntsky! I'm thinking of implementing this with a new repomix.config.json {
"output": {
// ... output config
}
"process": {
"maxLines": 100, // Default limit for all files
"patterns": [
{
"pattern": "**/*.json", // Special limits for JSON files
"maxLines": 20
}
]
}
} The output would look like: {
"users": [
{
"id": 1,
"name": "John"
}
]
... (truncated) Let me know if this is heading in the right direction! |
In some cases it may be useful to limit chars or words, not lines (e.g. unformatted json). Maybe all three should be configurable? |
@klntsky Given this context and considering how LLMs process text, I think focusing on token count would be the most appropriate approach initially. Something like: {
"process": {
"maxTokens": 1000, // Global token limit
"patterns": [
{
"pattern": "**/*.json",
"maxTokens": 500 // Pattern-specific token limit
}
]
}
} I'd like to start with this simpler requirement to minimize potential bugs. What do you think about this approach? |
Yep, token limits seem to cover both cases, but I'd like to have lines too, because it's not immediately clear how many tokens are there in a part of the file, while lines can be inspected visually. |
That makes sense. Let me think about this a bit more. |
The use case is: I have multiple JSON data files. I want to include them in the LLM input, but only to show their structure, not the contents. I'd like to be able to specify that I just want to include the first N lines.
The text was updated successfully, but these errors were encountered: