You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It should be elif callable(tokenizer_or_token_counter):
The test fails because BaseEmbeddings.embed is a method, which inspect.isfunction does not consider to be a function.
The symptom of the crash was in the line Tokenizer backend {str(type(self.tokenizer))} not supported where it referred to my class and its method embed.
FYI, here is my class, I don't really know yet if it's correct, but I know that my change allowed it to run:
Chonkie had that line as a callable before but the reason why it was changed to the current method is because all tokenizers are technically callable since they implement the __call__ fn in their classes, which causes that conditional to error out.
If you can do one check, that would be great! Could you see by adding the get_tokenizer_or_token_counter() funtion into your class as:
I haven't use the other tokenizers yet, so I didn't know how they worked. However, I can confirm that your suggestion also didn't work. I get a different error: ValueError: Tokenizer backend <class 'method'> not supported
I think the underlying issue is the same, relating to inspect.isfunction only returning true for a function that's not bound to a class. This does work as a substitute for callable, with or without with your suggested edit: inspect.ismethod(tokenizer_or_token_counter)
Sorry for the delay and thanks for trying it out; ismethod also sounds like a good option. I'll try evaluating the options we have and get a fix pushed ASAP.
I tried to write a BaseEmbeddings implementation and it crashed due to this line:
chonkie/src/chonkie/chunker/base.py
Line 34 in 9904935
It should be
elif callable(tokenizer_or_token_counter):
The test fails because BaseEmbeddings.embed is a method, which
inspect.isfunction
does not consider to be a function.The symptom of the crash was in the line
Tokenizer backend {str(type(self.tokenizer))} not supported
where it referred to my class and its methodembed
.FYI, here is my class, I don't really know yet if it's correct, but I know that my change allowed it to run:
The text was updated successfully, but these errors were encountered: