Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use zero-width delimiters for role tracking in gptel-mode #565

Open
lispy-ai opened this issue Jan 13, 2025 · 3 comments
Open

Use zero-width delimiters for role tracking in gptel-mode #565

lispy-ai opened this issue Jan 13, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@lispy-ai
Copy link

Title: Use zero-width delimiters for role tracking with overlay-based highlighting in gptel-mode buffers

This is a proposal for a new approach to tracking and visually distinguishing assistant/user roles in gptel buffers that addresses several long-standing issues (#321, #343) while maintaining compatibility with the existing system.

The Problem

Currently gptel uses text properties to track which sections of text are assistant responses. This approach has proven problematic because:

  1. Text properties don't interact naturally with standard Emacs editing operations
  2. Property stickiness creates ambiguous cases during editing
  3. Yanked text carries properties that can cause confusion
  4. Visual feedback about roles is difficult to implement reliably

Proposed Solution

Use zero-width Unicode characters as role delimiters with overlay-based highlighting, but only when gptel-mode is active:

  • U+200B (zero-width space) marks response start
  • U+200C (zero-width non-joiner) marks response end
  • Overlays provide visual distinction for responses

Key aspects:

  • Delimiters are invisible and don't affect buffer display
  • Standard editing operations work naturally
  • Cut/paste preserves role boundaries correctly
  • Overlays provide clean visual feedback
  • Works with all major modes

Implementation

The solution uses two phases:

  1. When gptel-mode is enabled:

    • Convert existing gptel text properties to delimiter pairs
    • Remove gptel properties
    • Enable delimiter-based role tracking
    • Create overlays for responses
  2. When gptel-mode is disabled:

    • Convert delimiters back to gptel properties
    • Remove delimiters
    • Remove overlays
    • Restore property-based tracking

Response Highlighting

Use overlays for visual distinction:

  • Clean visual distinction
  • No interference with text properties
  • Preserves other modes' fontification
  • Easy to customize appearance

Benefits

  1. Reliable editing operations:

    • Cut/paste works naturally
    • Undo/redo maintains role boundaries
    • No property stickiness issues
  2. Better user experience:

    • Clear visual distinction of responses
    • Predictable editing behaviour
    • Compatible with standard Emacs commands
    • Non-intrusive highlighting
  3. Technical improvements:

    • Simple to parse conversation history
    • Clean visual feedback via overlays
    • Works with all major modes
    • Separation of tracking and display

Testing

To test this change:

  1. Enable gptel-mode in a buffer with existing responses
  2. Verify properties convert to delimiters correctly
  3. Test editing operations (especially cut/paste)
  4. Verify overlay highlighting
  5. Disable mode and verify cleanup
  6. Check property restoration

Notes

  • Only affects buffers with gptel-mode active
  • Zero-width characters don't affect buffer display or export
  • Maintains compatibility with existing gptel features
  • Solves long-standing editing issues
  • Provides clean visual distinction via overlays

Caveat

There is an obvious caveat here. Enabling the mode mutates the buffer. The characters I have chosen are highly unlikely to appear in regular text. One solution might be that instead of predefining two characters, allow these characters to be configurable via buffer local variables or customisation, or have them automatically selected from a set of candidate characters which characters do not appear in the buffer when scanned upon entering gptel-mode.

Related issues: #321, #343

@axelknock
Copy link
Contributor

I like this as a solution that would also make it very simple to edit responses, which is quite a powerful method for guiding output.

I can forsee situations where an odd number of separators exist in the buffer, which would cause gptel-send to fail. In that case a function gptel-mark-response could simply wrap a selected region with the separators, deleting any that are inside the active region. gptel-show-separators/gptel-hide-separators could also replace the separators with something visible for inspection. The latter would most usefully replace the separators with some indicator of message count, probably xml-like (<message_1> </message_1>).

I also feel this violates the central ethos of gptel that prevented karthik from using response indicators in the first place. You would end up with documents containing invisible characters if you copy-and-paste responses. But I do think it addresses the main issues #546 without introducing more unacceptable problems. Backwards compatibility could be maintained by automatically dropping the separators in buffers where the previous method was used.

@lispy-ai
Copy link
Author

I think the zero-width delimiter approach effectively addresses these concerns while maintaining gptel's simplicity:

  • Invisible but robust role tracking:

    • Zero-width delimiters mark response boundaries (carefully chosen to avoid text conflicts)
    • Overlays provide clear visual feedback of boundaries
    • Delimiters aren't saved to disk, preserving clean file format
    • Existing GPTEL_BOUNDS continue working normally
  • Optional safe editing operations in gptel-mode buffers:

    • Add advice to emacs editing primitives to handle delimiters:
      (advice-add 'insert-before-markers :around #'gptel--clean-insertion-advice)
      (advice-add 'delete-region :around #'gptel--preserve-delimiters-advice)
    • Strip delimiters from inserted text
    • Preserve necessary delimiters at region boundaries during deletion
    • External editors can modify files without corruption
    • Copy/paste operations work cleanly
  • Recovery tools:

    • gptel-mark-response to mark region as response (or with prefix to mark as prompt)
    • gptel-validate-buffer to check and repair delimiter integrity
    • gptel-show-separators/gptel-hide-separators for visual inspection
    • Overlay system shows current prompt/response status clearly
  • Backwards compatibility:

    • No migration needed for existing chat logs
    • Delimiters recreated from bounds when loading buffer
    • Maintains the "everything up to cursor" interaction model

A simpler alternative would be to:

  • Skip the safe editing operations entirely
  • Rely on clear overlay feedback to show prompt/response regions
  • Trust users to maintain/repair their chat buffers as needed
  • Provide the same robust recovery tools above

This simpler approach might be preferable - users get immediate visual feedback about response regions and can easily fix any corruption using gptel-mark-response. The editing safeguards may be unnecessary complexity given good overlay feedback and repair tools.

All that said and backtracking a bit, @daedsidog suggested in #343 that simply making regions explicitly visible and allow them to be fixed up with gptel-mark-response (or gptel-toggle-response-role per his suggestion) might be easiest because

  1. It maintains the existing text property mechanism but adds explicit user control
  2. It avoids introducing new delimiter-related complexity and edge cases
  3. The visual feedback is also through overlays and makes it clear what's prompt vs response
  4. Manual region marking with gptel-mark-response gives users direct control of prompt vs response

The zero-width delimiter approach I've suggested, while elegant in some ways, introduces:

  • New edge cases around delimiter handling
  • Possibly complex advice on editing primitives if you take it that far
  • Potential for delimiter corruption requiring repair tools
  • Additional complexity in buffer management

On balance perhaps the simplest solution would be:

  1. Keep existing text property mechanism
  2. Add clear overlay-based visual feedback
  3. Provide gptel-mark-response command for manual region control
  4. Trust users to maintain their chat buffers with these tools

This would maintain gptel's existing M.O. while giving users the tools they need to manage/edit prompt/response regions effectively. The visual feedback through overlays addresses the "what is marked as what" problem, while manual region control handles edge cases without introducing new complexity.

The benefit of #565 the zero width delimiter solution is that with careful editing (avoiding region boundaries) you won't break the prompt response sequence within a buffer. But with the existing text properties mechanism you always will break the sequence because of the way text properties are handled in emacs. That is, more often than not, prompt/response regions will need to be "fixed up after editing the buffer", notwithstanding the sticky patch 25efd55 that @karthink recently introduced to mitigate this (I've found myriad ways to break this with yank and other editing commands).

[2025-01-17 Fri 11:32]

@axelknock
Copy link
Contributor

A potential way to introduce this without changing the way gptel fundamentally works would be introducing two customizeable variables like gptel-response-start/gptel-response-end, which when both non-nil will break up the buffer like the described behavior. Surfacing this in the transient menu would allow users to opt to use this behavior in some buffers and not others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants