NER using constrained token generation #346

pj-ml · 2023-11-02T11:41:34Z

pj-ml
Nov 2, 2023

I hope this may be helpful, and if helpful a good starting point to construct a new NER framework when using LLMs. It could definitely do with some refinement. This is my first suggestion on GitHub ever, so please go easy on me.

This suggestion would:

probably only work when using open-source LLMs (but I am not up to date on all the commercial API functionality)
Is a significant deviation from the current NER methodology (like ner.v3)

Suppose we want to perform NER on the text_to_label sentence:

text_to_label = "Quick brown fox jumps over the lazy dog"

candidate_ner_categories = ["pet", "other animal"]

prompt = (
    """You are an expert Named Entity Recognition labeler. Your task is to identify and label entities by emulating the following example:
Input text: The patient; received 100mg doxycycline via IV 3x daily. No further deterioration!
Entity categories = dosage, medication frequency, medication route, medication name
Output: The patient; received {ENT~100mg~dosage} {ENT~doxycycline~medication name} via {ENT~IV~medication route} {ENT~3x daily~medication frequency}. No further deterioration!

Start of task:
Input text: """
    + text_to_label
    + "\nEntity categories: "
    + ", ".join(candidate_ner_categories)
    + "\nOutput: "
)

For every generated token, we force the token to be either the start of our "{ENT~~}" sequence or the next token in our input text. Suppose each token is only one character; we can constrain the LLM's next output token to be the most likely option between: "Q" and "{". These constraints would likely improve NER quality, at the cost of generating more tokens (or in some cases, fewer tokens).

rmitsch · 2023-11-02T11:51:59Z

rmitsch
Nov 2, 2023
Maintainer

Hi @pj-ml, we're always open to suggestions for improving our prompts 🙂 Have you benchmarked this against spacy.NER.v3? Alternatively, is there a paper comparing this against other NER prompt styles?

0 replies

pj-ml · 2023-11-02T13:10:42Z

pj-ml
Nov 2, 2023
Author

I have not benchmarked this against spacy.NER.v3.

For context: My coding skills are ok, but a good software engineer with domain expertise is going to solve this problem 10x faster than I can, and they will have higher quality code to show for it. My statistics background is probably more useful here.

Here is how I think about the NER problem (theoretically):
Many NER prompting templates allow the LLM a lot of flexibility in the NER generation. This flexibility also opens up the model to selecting one of many incorrect (and completely irrelevant) options. When you set up a prompting template that allows you to carefully constrain the model generation to a set of selected "correct" tokens, you drastically decrease the probability of a mistake being made.

LLMs are conditioned heavily on their input tokens. For LLMs, the cost of generating an incorrect token (especially early in the output sequence) is very high because these incorrect tokens will affect the accuracy of (virtually) all future generated tokens. Or stated in another way, when an LLM generates an incorrect token, it affects the X (the prompt) negatively in the P(Y|X) equation for (virtually) all future tokens (as long as the incorrect token is still in the context window). This can lead to a runaway "incorrect" token generation scenario or repeated mistakes scenario.

Another advantage of the method I proposed is "part-of-sequence" attention. Ideally, with NER, you want the model to extract the relevant entities from the entire sequence. Sometimes, this sequence is long. The longer the sequence, the harder time the model is going to have to pay attention to all of the relevant entities simultaneously unless you guide the attention with pinpoint accuracy using a prompt. When you generate your input sequence as part of your NER extraction answer, you are fully focusing the model's attention on the "part-of-sequence" currently relevant to entity extraction. You also ensure that each part of your input sequence gets the model's full attention at some point during the output generation. Without guiding the attention in this way, you are playing attention roulette, where the odds of getting a win (win=extracting an entity correctly) decreases with increasing input sequence length (probably very drastically when you start to hit the limits of the number of tokens the model can pay attention to simultaneously).

The last advantage is that because you are re-generating the input sequence, the model is forced to pay attention to the context around the entity that you are extracting. The entity is less likely to be extracted in isolation (avoiding situations where the model may have missed that the context modified the entity's category).

The disadvantage is that if you extract entities in this way (and especially if the list of entities is long), the model might fall into a probability rut, using only a subset of the entities when it is extracting.

@rmitsch, You probably know more about the above than me, so it would be interesting to hear your thoughts and where my reasoning might be flawed.

For the solution I proposed, I am sure there are papers somewhere on this method, but I haven't found them. To be able to fairly assess the proposed method, the papers would have to use the latest SOTA open-source LLMs. I will continue searching for papers and let you know once I find something.

3 replies

rmitsch Nov 3, 2023
Maintainer

Thanks for the write-up! Your reasoning does make sense to me, and I can imagine that it might lead to more accurate NER than the current style of prompts we have in `spacy-llm``.

One drawback to this, as you mentioned, is that this wouldn't work on all models since AFAIK hosted providers don't offer the ability to restrain the output like this and it's a design goal to have all tasks work for all models. Then again, if it's really better, an OS-only NER task might be an option.

Anyway, we'll keep this in mind and discuss it further internally. If you venture out on your own with this, we'd be happy if you updated this thread with your insights!

pj-ml Nov 3, 2023
Author

Thanks for your consideration. I will let you know when I have some benchmark results or if I find any papers using this method. It's unclear when I will get time to work on this, but it should be in the next 6 months.

btw, I was wrong about the Huggingface zero-shot-classification pipeline doing something similar to what I suggested. I know they do have a constrained beam search, so I will look into that.

pj-ml Jan 30, 2024
Author

vllm and sglang will soon support more sophisticated constrained token generation, which would make it cheap and fast to run the proposed NER task. Exciting times!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER using constrained token generation #346

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

NER using constrained token generation #346

pj-ml Nov 2, 2023

Replies: 2 comments · 3 replies

rmitsch Nov 2, 2023 Maintainer

pj-ml Nov 2, 2023 Author

rmitsch Nov 3, 2023 Maintainer

pj-ml Nov 3, 2023 Author

pj-ml Jan 30, 2024 Author

pj-ml
Nov 2, 2023

Replies: 2 comments 3 replies

rmitsch
Nov 2, 2023
Maintainer

pj-ml
Nov 2, 2023
Author

rmitsch Nov 3, 2023
Maintainer

pj-ml Nov 3, 2023
Author

pj-ml Jan 30, 2024
Author