Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Font lock for embedded Elixir #2

Open
hochata opened this issue Mar 12, 2023 · 9 comments
Open

Font lock for embedded Elixir #2

hochata opened this issue Mar 12, 2023 · 9 comments

Comments

@hochata
Copy link

hochata commented Mar 12, 2023

Hi!

Currently, code in <%= "embedded expressions" > gets the default face. It would be nice to have Elixir syntax highlight in those blocks.

@wkirschbaum
Copy link
Owner

wkirschbaum commented Mar 12, 2023

I agree :) but not sure how to approach this yet. We don't get much information from the tree-sitter-heex grammar as a directive ( embedded expressions ) might look like this:

(directive [1, 2] - [1, 19]
      (partial_expression_value [1, 5] - [1, 16]))
 ....
(directive [3, 2] - [3, 11]
   (ending_expression_value [3, 4] - [3, 8]))

where the directive neither encloses or reveals the content.

Another option would be to overlay elixir syntax on top of the heex syntax, but currently elixir-ts-mode requires heex-ts-mode, so then we fall into the circular dependency.

Unfortunately with the MELPA commit requirements we have to have 2 separate packages for these two modes, but we can combine elixir and heex-mode on emacs master ( only for emacs 30 ) and we can then overlay, but it will be at least a year from now, so will be nice to have an interim solution.

The option I am hoping for is to enhance the tree-sitter-heex grammar with enough information to apply some faces.

@hochata
Copy link
Author

hochata commented Mar 13, 2023

The option I am hoping for is to enhance the tree-sitter-heex grammar with enough information to apply some faces.

I think that is a good idea. To move all the common code (like queries and indentation rules) from elixir-ts-mode and heex-ts-mode into a common elixit-ts-base file so there are no cyclic depedencies.

@KaranAhlawat
Copy link

Could we do something like injecting the elixir grammar into the enclosed regions? From what I know injecting grammars into ranges is pretty common in the tree-sitter world, and last I checked it is supported in Emacs' tree-sitter (treesit) implementation as well.

@KaranAhlawat
Copy link

KaranAhlawat commented Apr 15, 2024

We can probably do something like this here. Neovim is able to take advantage of these injections.scm files, and I'm hopeful we can reproduce this (to the best of treesit's current capabilities) inside Emacs.
https://github.com/phoenixframework/tree-sitter-heex/blob/main/queries/injections.scm

@wkirschbaum
Copy link
Owner

I pull request would be welcome, I won't be able to work on this in the near future.

@KaranAhlawat
Copy link

I'll try my hand at it then!

@elken
Copy link

elken commented Dec 21, 2024

Seems we're on a tangential venture!

I'm looking at a similar TS mode for ERB (Ruby on Rails' version of heex) and I'm trying to rebase it around heex since mine has a few bugs in it. Though I do have font-locking working fine, so maybe I can feed back and help you guys along.

You can have multiple parsers running on the buffer at once, and each one will have its own representation of what the buffer state is. Of course this means sometimes they get errors, which don't inherently break the file, but it means you have to also keep track of where the bounds of "code" are.

I'll add in some code examples below, but bear in mind that my mode is very much in active development 😄

In my mode setup I have

(when (treesit-ready-p 'embedded-template)
    (treesit-parser-create 'embedded-template)
    (treesit-parser-create 'html)
    (treesit-parser-create 'ruby)
    (erb-ts-setup))

Which sets everything up nicely. The setup function then is below, mostly configuring the basics and the key part; setting up the ranges and hooking the parsers up.

(defun erb-ts-setup ()
  "Setup treesit for erb-ts-mode."
  (setq-local electric-pair-pairs
              '((?\< . ?\>)
                (?\% . ?\%)
                (?\{ . ?\})
                (?\( . ?\))
                (?\[ . ?\])
                (?\' . ?\')
                (?\" . ?\")))
  
  (setq-local treesit-font-lock-settings
              (append (ruby-ts--font-lock-settings 'ruby)
                      (apply #'treesit-font-lock-rules
                             erb-ts-font-lock-rules)))
  
  (setq-local treesit-font-lock-feature-list
              '((erb-delimiter output directive comment)
                (tag attribute delimiter declaration)
                (keyword string const method-definition parameter-definition 
                         variable method builtin-variable builtin-constant 
                         builtin-function delimiter escape-sequence constant 
                         global instance interpolation literal symbol assignment)
                (bracket error function operator punctuation)))
  
  ;; Set up parsers and ranges
  (let ((ruby-parser (cl-find 'ruby (treesit-parser-list) :key #'treesit-parser-language))
        (html-parser (cl-find 'html (treesit-parser-list) :key #'treesit-parser-language))
        (erb-parser (cl-find 'embedded-template (treesit-parser-list) :key #'treesit-parser-language)))
    
    ;; Set up HTML parser to handle everything
    (treesit-parser-set-included-ranges html-parser 
                                       (list (cons (point-min) (point-max))))
    
    ;; Initial range setup for Ruby
    (erb-ts-update-ruby-ranges nil erb-parser)
    
    ;; Add after-change hook to update ranges
    (add-hook 'after-change-functions
              (lambda (beg end len)
                (erb-ts-update-ruby-ranges 
                 (list (cons beg end))
                 erb-parser))
              nil t))
  
  (treesit-major-mode-setup))

The range setup functions have gone through many iterations but seem to be somewhat stable now. I added a debug mode to highlight all the code areas, and it seems to always be green at least now (red areas would be ranges that have de-synced, it does happen sometimes but I can't seem to repro.)

Screen.Recording.2024-12-21.at.21.35.09.mov

So the range code is below

(defun erb-ts-in-ruby-content-p (pos)
  "Check if POS is within a Ruby code block."
  (let* ((erb-parser (cl-find 'embedded-template (treesit-parser-list) :key #'treesit-parser-language))
         (node (treesit-node-at pos 'embedded-template)))
    (and node
         (member (treesit-node-type node) 
                '("code" "output_directive" "directive")))))

(defun erb-ts-ruby-block-range (pos)
  "Get the full range of the Ruby block at POS.
Returns cons cell (START . END) or nil if not in Ruby block."
  (let* ((erb-parser (cl-find 'embedded-template (treesit-parser-list) :key #'treesit-parser-language))
         (node (treesit-node-at pos 'embedded-template)))
    (when node
      (let ((parent (treesit-node-parent node)))
        (when (and parent (member (treesit-node-type parent)
                                '("output_directive" "directive")))
          (let ((code-node (treesit-node-child-by-field-name parent "code")))
            (when code-node
              (cons (treesit-node-start code-node)
                    (treesit-node-end code-node)))))))))

(defun erb-ts-update-ruby-ranges (ranges parser)
  "Update Ruby parser ranges based on ERB code blocks."
  (when-let ((ruby-parser (cl-find 'ruby (treesit-parser-list)
                                  :key #'treesit-parser-language)))
    (let* ((root (treesit-parser-root-node parser))
           (new-ranges '())
           (current-ranges (treesit-parser-included-ranges ruby-parser)))
      
      ;; Get ranges from both types of directives
      (dolist (node (treesit-query-capture 
                    root 
                    '((directive (code) @ruby)
                      (output_directive (code) @ruby))))
        (let* ((node (cdr node))
               (start (treesit-node-start node))
               (end (treesit-node-end node)))
          (push (cons start end) new-ranges)))
      
      ;; Sort ranges
      (setq new-ranges (sort new-ranges (lambda (a b) (< (car a) (car b)))))
      
      ;; Only update if ranges have actually changed
      (when (and new-ranges
                 (not (equal new-ranges current-ranges)))
        (treesit-parser-set-included-ranges ruby-parser new-ranges)))))

But it is quite complex and I definitely need to try and refactor this.

Then the last bit is just setting all the styling up, as you saw from my setup function I just merge the font-lock settings from ruby-ts-mode with mine of which I just build a couple, mostly for the delimiters and comments and a few rules from html-ts-mode.

(defvar erb-ts-font-lock-rules
  '(:language embedded-template
    :feature output
    ((output_directive
      (code) @font-lock-variable-name-face))

    :language embedded-template
    :feature directive
    ((directive
      (code) @font-lock-preprocessor-face))
    
    :language embedded-template
    :feature comment
    ((comment_directive
      (comment) @font-lock-comment-face))
    
    :language embedded-template
    :feature erb-delimiter
    ((["<%=" "<%#" "<%" "%>" "-%>"] @font-lock-keyword-face))
    
    :language html
    :feature tag
    ;; Regular elements
    ((element
      (start_tag (tag_name) @font-lock-function-name-face))
     (element
      (end_tag (tag_name) @font-lock-function-name-face))
     (self_closing_tag
      (tag_name) @font-lock-function-name-face)
     ;; Special elements
     (script_element
      (start_tag (tag_name) @font-lock-keyword-face))
     (script_element
      (end_tag (tag_name) @font-lock-keyword-face))
     (style_element
      (start_tag (tag_name) @font-lock-keyword-face))
     (style_element
      (end_tag (tag_name) @font-lock-keyword-face)))

    :language html
    :feature attribute
    ((attribute
      (attribute_name) @font-lock-variable-name-face)
     (quoted_attribute_value) @font-lock-string-face)

    :language html
    :feature comment
    ((comment) @font-lock-comment-face)

    :language html
    :override t
    :feature declaration
    ((doctype) @font-lock-keyword-face)
    
    :language ruby
    :feature variable
    :override t
    ((call
      receiver: (identifier) @font-lock-variable-name-face) @_call
      (identifier) @font-lock-variable-name-face)

    :language ruby
    :feature method
    :override t  
    ((call
      method: (identifier) @font-lock-function-call-face) @_call))
  )

Sorry for the wall of text and code, but I hope this was useful! I would instead just dump this in a repo but for now it's easier to tinker locally until I have something stable.

@wkirschbaum
Copy link
Owner

@elken thanks :), I will have a look soon.

@elken
Copy link

elken commented Jan 6, 2025

I have since updated my understanding thanks to upcoming Emacs 30 changes that greatly simplify this. I've put my erb-ts-mode on hold for now since the existing grammar is quite poor and the undertaking of writing my own led me down a rabbit hole; but now all that's needed is to define the range rules.

(defvar erb-ts-mode--range-settings
  (treesit-range-rules
   :embed 'ruby
   :host 'embedded-template
   '((code) @capture)
   
   :embed 'javascript
   :host 'html
   :offset '(1 . -1)
   '((script_element
      (start_tag)
      (raw_text) @capture))
   
   :embed 'css
   :host 'html
   :offset '(1 . -1)
   '((style_element
      (start_tag)
      (raw_text) @capture))))

You can also specify these to be :local 't if you want each range to use a different parser. More of the changes can be found here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants