-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathGLiNER.py
73 lines (56 loc) · 4.04 KB
/
GLiNER.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# From https://towardsdatascience.com/extract-any-entity-from-text-with-gliner-32b413cea787
"""
GLiNER, short for Generalist Model for Named Entity Recognition, is an innovative NER (Named Entity Recognition)
model that utilizes a bidirectional transformer encoder akin to BERT.
Unlike traditional NER models restricted to predefined entities or resource-intensive Large Language Models (LLMs),
GLiNER offers a practical alternative suitable for resource-limited scenarios.
Technically, GLiNER employs a BiLM (Bidirectional Transformer Language Model) and accepts entity prompts along with sentence or text inputs.
Entities are demarcated by a learned token, [ENT], and vector representations (embeddings) for each token are generated by the BiLM.
These embeddings, along with representations of input words, are fed into neural networks,
with dedicated layers for learning character windows enclosing the tokens (spans).
A similarity score between entity representations and span representations is then computed using dot product and sigmoid activation.
GLiNER diverges from autoregressive models like GPT-3.5 and 4, opting for smaller-scale bidirectional language models like BERT or deBERTa.
This departure enables bidirectional context processing without the scalability challenges associated with autoregressive models.
At the time of its introduction, GLiNER demonstrated superior performance compared to ChatGPT and LLM-optimized zero-shot NER datasets.
Its lightweight, scalable, fast, and accurate nature makes it potentially revolutionary,
offering NER capabilities at zero cost and with enhanced efficiency compared to both LLM-based and traditional approaches.
GLiNER's performance, even in its smaller versions, exceeds that of ChatGPT,
showcasing its efficacy in various NER tasks. Despite being surpassed by other models in specific domains, the performance gap is marginal.
Available in different sizes, with the small version boasting 50 million parameters (approximately 600MB),
GLiNER presents itself as a promising solution for common NER challenges and has the potential to become a primary choice for NER tasks.
Limitations of GLiNER
1. The model suffers from training done on unbalanced classes
some classes are more frequent than others and would like to improve the identification and management of these classes
with a dedicated loss function
2. multilingual skills need to be improved
training in multiple languages is needed (currently Italian is poorly supported)
it is based on a similarity threshold: in the future the authors would like to make it dynamic so as to capture as many entities
as possible without distorting the results
"""
! pip install gliner
from gliner import GLiNER
model = GLiNER.from_pretrained("urchade/gliner_base")
text = """
Cristiano Ronaldo dos Santos Aveiro
(Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985)
is a Portuguese professional footballer who plays as a forward for
and captains both Saudi Pro League club Al Nassr and the Portugal national
team. Widely regarded as one of the greatest players of all time,
Ronaldo has won five Ballon d'Or awards,[note 3] a record three
UEFA Men's Player of the Year Awards, and four European Golden Shoes,
the most by a European player. He has won 33 trophies in his career,
including seven league titles, five UEFA Champions Leagues,
the UEFA European Championship and the UEFA Nations League.
Ronaldo holds the records for most appearances (183), goals (140)
and assists (42) in the Champions League, goals in the
European Championship (14), international goals (128) and
international appearances (205).
He is one of the few players to have made over 1,200 professional
career appearances, the most by an outfield player,
and has scored over 850 official senior career goals for club and country,
making him the top goalscorer of all time.
"""
labels = ["person", "award", "date", "competitions", "teams"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(entity["text"], "=>", entity["label"])