Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER - caught in the middle of an entity, starting with "I-" instead of "B-" #1699

Open
ghnp5 opened this issue Jan 31, 2025 · 2 comments
Open
Labels

Comments

@ghnp5
Copy link

ghnp5 commented Jan 31, 2025

Hi,

This relates to #1657
Model ner_mult_long_demo

There are several text inputs that return tags/labels that start with "I-" rather than "B-".

Example:

This is a public service announcement.

This returns:

[[["This","is","a","public","service","announcement","."],["O","O","O","O","I-EVENT_NAME","I-EVENT_NAME","O"]]]

See how there's no B-EVENT_NAME.


docker-compose.yml:

  ner:
    container_name: ner
    image: deeppavlov/deeppavlov
    environment:
      - CONFIG=ner_mult_long_demo
    restart: always
    volumes:
      - ./deeppavlov/ner_mult_long_demo.json:/usr/local/lib/python3.10/site-packages/deeppavlov/configs/classifiers/ner_mult_long_demo.json
      - ./deeppavlov/sentence_delimiter.py:/usr/local/lib/python3.10/site-packages/deeppavlov/models/tokenizers/sentence_delimiter.py
      - ./deeppavlov/pysbd.txt:/usr/local/lib/python3.10/site-packages/deeppavlov/requirements/pysbd.txt
      - ./deeppavlov/registry.json:/usr/local/lib/python3.10/site-packages/deeppavlov/core/common/registry.json
      - ./deeppavlov/requirements_registry.json:/usr/local/lib/python3.10/site-packages/deeppavlov/core/common/requirements_registry.json
      - ./data:/root/.deeppavlov
      - ./venv:/venv
    entrypoint:
      - /bin/sh
      - -c
      - |
        /usr/local/bin/python3.10 -m pip install pysbd==0.3.4
        python -m deeppavlov riseapi ner_mult_long_demo -p 5000 -d

This doesn't happen in the demo of the website, nor if I use the model ner_demo_mdeberta_address

@ghnp5 ghnp5 added the bug label Jan 31, 2025
@ghnp5
Copy link
Author

ghnp5 commented Jan 31, 2025

Other examples:

This is not a political debate either!

[[["This","is","not","a","political","debate","either","!"],["O","O","O","O","O","I-EVENT_NAME","O","O"]]]

Status Examination. . . . . . .

[[["Status","Examination",".",".",".",".","",".",".","","."],["O","I-EVENT_NAME","O","O","O","O","O","O","O","O","O"]]]

I broke my new years resolution.

[[["I","broke","my","new","years","resolution","."],["O","O","O","O","I-EVENT_NAME","O","O"]]]

@ghnp5
Copy link
Author

ghnp5 commented Jan 31, 2025

It can happen with ner_demo_mdeberta_address too, even in the website Demo.

(please ignore the meaning of the sentences - they're just random sentences from the database I have, and then I tweak small bits to avoid keeping the original data, while trying to keep the bug happening)

Why should Old Man Felon Insurrectionist-Cry-Baby-smith hire

[[
  ["Why","should","Old","Man","Felon","Insurrectionist","-","Cry","-","Baby","-","smith","hire"],
  ["O","O","B-PERSON","I-PERSON","O","I-PERSON","I-PERSON","I-PERSON","I-PERSON","I-PERSON","I-PERSON","I-PERSON","O"]
]]

--

Why should Old Man Felon Insurrectionist-Cry-Baby-eric hire

[[
  ["Why","should","Old","Man","Felon","Insurrectionist","-","Cry","-","Baby","-","eric","hire"],
  ["O","O","O","O","O","O","O","O","O","O","O","I-PERSON","O"]
]]

--

In this case, it started with PERSON, but then switched to WORK_OF_ART, in the middle:

Why should Old Man Crazy & Felon Insurrectionist-Cry-Baby-smith hire

[[
  ["Why","should","Old","Man","Crazy","&","Felon","Insurrectionist","-","Cry","-","Baby","-","smith","hire"],
  ["O","O","O","O","B-PERSON","I-WORK_OF_ART","I-WORK_OF_ART","I-WORK_OF_ART","I-WORK_OF_ART","I-WORK_OF_ART","I-PERSON","I-PERSON","I-PERSON","I-PERSON","O"]
]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant