Skip to content

Commit b2e8cfb

Browse files
authored
Named Entity Resolution with dslim/distilbert-NER
1 parent 2f82a45 commit b2e8cfb

File tree

1 file changed

+77
-0
lines changed

1 file changed

+77
-0
lines changed

llms/bert-ner.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Named Entity Resolution with dslim/distilbert-NER
2+
3+
I was exploring the original BERT model from 2018, which is mainly useful if you fine-tune a model on top of it for a specific task.
4+
5+
[dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER) by David S. Lim is a popular implementation of this, with around 20,000 downloads from Hugging Face every month.
6+
7+
I tried the demo from the README but it didn't quite work - it complained about an incompatibility with Numpy 2.0.
8+
9+
So I used `uv run --with 'numpy<2.0'` to run it in a temporary virtual environment. Here's a Bash one-liner that demonstrated the model:
10+
11+
```bash
12+
uv run --with 'numpy<2.0' --with transformers python -c '
13+
from transformers import AutoTokenizer, AutoModelForTokenClassification
14+
from transformers import pipeline
15+
import json
16+
model = AutoModelForTokenClassification.from_pretrained("dslim/distilbert-NER")
17+
tokenizer = AutoTokenizer.from_pretrained("dslim/distilbert-NER")
18+
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
19+
text = "This is an example sentence about Simon Willison who lives in Half Moon Bay"
20+
print(json.dumps(nlp(text), indent=2, default=repr))'
21+
```
22+
The first time you run this it will download 250MB to your `~/.cache/huggingface/hub/models--dslim--distilbert-NER` folder.
23+
24+
Example output:
25+
26+
```json
27+
[
28+
{
29+
"entity": "B-PER",
30+
"score": "0.9982101",
31+
"index": 7,
32+
"word": "Simon",
33+
"start": 34,
34+
"end": 39
35+
},
36+
{
37+
"entity": "I-PER",
38+
"score": "0.99835676",
39+
"index": 8,
40+
"word": "Willis",
41+
"start": 40,
42+
"end": 46
43+
},
44+
{
45+
"entity": "I-PER",
46+
"score": "0.9977602",
47+
"index": 9,
48+
"word": "##on",
49+
"start": 46,
50+
"end": 48
51+
},
52+
{
53+
"entity": "B-LOC",
54+
"score": "0.99432063",
55+
"index": 13,
56+
"word": "Half",
57+
"start": 62,
58+
"end": 66
59+
},
60+
{
61+
"entity": "I-LOC",
62+
"score": "0.99325883",
63+
"index": 14,
64+
"word": "Moon",
65+
"start": 67,
66+
"end": 71
67+
},
68+
{
69+
"entity": "I-LOC",
70+
"score": "0.9919292",
71+
"index": 15,
72+
"word": "Bay",
73+
"start": 72,
74+
"end": 75
75+
}
76+
]
77+
```

0 commit comments

Comments
 (0)