Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide convenience method for loading language library #23

Closed
Marcono1234 opened this issue Sep 1, 2024 · 5 comments
Closed

Provide convenience method for loading language library #23

Marcono1234 opened this issue Sep 1, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@Marcono1234
Copy link
Contributor

(Please correct me if anything of the following is wrong)

If I understand it correctly, for all parser implementations there is always a tree_sitter_<lang> function, and it always has the same signature.

Currently jtreesitter only provides a Language(MemorySegment) constructor, so you have to generate boilerplate code which looks up the tree_sitter_<lang> function and invokes it (as done in the test code). This can be an obstacle for new users of jtreesitter because they either have to be a bit familiar with java.lang.foreign, or blindly copy code they don't understand.

It would be useful if Language provided a convenience method for this, for example:

public static Language loadLanguage(SymbolLookup parserLibrary, String languageName)

The user could then easily use SymbolLookup#libraryLookup to load the library and then use that Language#loadLanguage method.

If you want I can try to create a proof-of-concept PR for this.

@ObserverOfTime
Copy link
Member

The plan is to eventually integrate those bindings into the parsers (see tree-sitter/tree-sitter-java#182).

@Marcono1234
Copy link
Contributor Author

But that is specifically for tree-sitter-java, right? That would certainly be useful, but I was thinking of a more general solution for all parsers, e.g. Python, JSON, ... since they all have a tree_sitter_<lang> function with the same signature (?).

@ObserverOfTime
Copy link
Member

The CLI will generate bindings for all parsers like it does for other languages.

@ObserverOfTime ObserverOfTime added the enhancement New feature or request label Sep 1, 2024
@Marcono1234
Copy link
Contributor Author

Marcono1234 commented Sep 1, 2024

Ah, I think I misunderstood you. Is the plan to generate Java bindings for all parsers, and the tree-sitter-java one was just an example? That would be great then!

But would it make sense nonetheless to add a generic loadLanguage method here, for cases where a repository does not include a bindings/java/.../TreeSitter<lang>.java yet?

I was thinking of something like this:

public final class Language {
    /**
     * Loads a language using the given symbol lookup for the native library.
     * For example:
     * {@snippet lang=java :
     * Path pathToLibrary = Path.of("libtree-sitter-python.so");
     * SymbolLookup libraryLookup = SymbolLookup.libraryLookup(pathToLibrary, Arena.ofAuto());
     * Language language = Language.loadLanguage(libraryLookup, "python");
     * }
     * 
     * @throws IllegalArgumentException If the Tree-sitter language function cannot be found using the symbol lookup
     */
    public static Language loadLanguage(SymbolLookup symbolLookup, String languageName) throws IllegalArgumentException {
        String functionName = "tree_sitter_" + languageName;
        MemorySegment functionAddress = symbolLookup.find(functionName)
            .orElseThrow(() -> new IllegalArgumentException("Language function '%s' not found".formatted(functionName)));

        var voidPtr = ValueLayout.ADDRESS.withTargetLayout(MemoryLayout.sequenceLayout(Long.MAX_VALUE, ValueLayout.JAVA_BYTE));
        var funcDesc = FunctionDescriptor.of(voidPtr);
        var function = Linker.nativeLinker().downcallHandle(functionAddress, funcDesc);
        MemorySegment languagePointer;
        try {
            languagePointer = ((MemorySegment) function.invokeExact()).asReadOnly();
        } catch (Throwable t) {
            throw new RuntimeException("Failed to call language function", t);
        }

        return new Language(languagePointer);
    }

    /**
     * Creates a new instance from the given language pointer.
     *
     * <p>Normally you don't have to obtain the language pointer yourself. Instead, you can either use the
     * generated Java bindings for a parser, for example:
     * {@snippet lang=java :
     * var pointer = TreeSitterPython.language();
     * Language language = new Language(pointer);
     * }
     * Or you can use {@link #loadLanguage(SymbolLookup, String)} to obtain a {@code Language} instance.
     *
     * @implNote It is up to the caller to ensure that the pointer is valid.
     *
     * @throws IllegalArgumentException If the language version is incompatible.
     */
    public Language(MemorySegment address) {
        // ...
    }

    // ...
}

The Javadoc here intentionally refers to tree-sitter-python to reduce confusion and to indicate that it works with any parser; otherwise a user might confuse tree-sitter-java with java-tree-sitter / jtreesitter, or think this jtreesitter only works with the Java parser.

@ObserverOfTime
Copy link
Member

But would it make sense nonetheless to add a generic loadLanguage method here

Only until the bindings are autogenerated, at which point it'll be deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants