Replies: 4 comments 4 replies
-
Hi @joancf thanks for the suggestion! When you refer to Java Gate doing this, I assume you mean the segment processing PR? I think functionality like that would be a great addition for gatenlp as well and I have added a FR issue for this: |
Beta Was this translation helpful? Give feedback.
-
I'm not sure if it's a "segment PR" I used GATE for some years, but not for the last 5 years :-( , now in Python, there is a new opportunity to use GATE again ;-) and the way it manages documents and annotations. I'm just talking of how I think it was. |
Beta Was this translation helpful? Give feedback.
-
The most generic way to run the tests should be to run or just run For this, obviously spacy, the spacy model and pytest need to be installed into your environment. The proper way to install from the repo would be |
Beta Was this translation helpful? Give feedback.
-
I will try to explain adding code and then we can open an issue. 1: how to pass parameters to spacy components (using component_cfg... ) So for example here I have the definition of a new component
When a language factory has parameters the way to pass them is adding the As from gateNlp we don't know which are the components and we don't know the parameters of them so, one simple solution (it could be more sophisticated) is to pass all the annotation features to a given component
** 2: how to get annotations from different spacy componnets (apart from the token, sentence...) **
And finally , we can also retrieve the "spacy document features" as gate features. If we process spans, these features will become span-features:
So the document features are copied to the current span. |
Beta Was this translation helpful? Give feedback.
-
In Java Gate, when calling an annotator, this one can be applied to a given subset of the document (to some annotations) for example the title, or the index of a document.
In the python version it seems that it must be run over the full document. Of course the annotator could introduce restrictions, but if it's a general purpose annotator provided by the project, this is not possible.
So if I want to use Spacy, then I have to run it over the full document. Maybe I would be interested in running my sentence split, and run it sentence by sentence, or apply different pipelines to different parts of the document
Thanks for the tool, it looks great!
Beta Was this translation helpful? Give feedback.
All reactions