Replies: 2 comments 3 replies
-
Hi @pj-ml, we're always open to suggestions for improving our prompts 🙂 Have you benchmarked this against |
Beta Was this translation helpful? Give feedback.
-
I have not benchmarked this against spacy.NER.v3. For context: My coding skills are ok, but a good software engineer with domain expertise is going to solve this problem 10x faster than I can, and they will have higher quality code to show for it. My statistics background is probably more useful here. Here is how I think about the NER problem (theoretically): LLMs are conditioned heavily on their input tokens. For LLMs, the cost of generating an incorrect token (especially early in the output sequence) is very high because these incorrect tokens will affect the accuracy of (virtually) all future generated tokens. Or stated in another way, when an LLM generates an incorrect token, it affects the X (the prompt) negatively in the P(Y|X) equation for (virtually) all future tokens (as long as the incorrect token is still in the context window). This can lead to a runaway "incorrect" token generation scenario or repeated mistakes scenario. Another advantage of the method I proposed is "part-of-sequence" attention. Ideally, with NER, you want the model to extract the relevant entities from the entire sequence. Sometimes, this sequence is long. The longer the sequence, the harder time the model is going to have to pay attention to all of the relevant entities simultaneously unless you guide the attention with pinpoint accuracy using a prompt. When you generate your input sequence as part of your NER extraction answer, you are fully focusing the model's attention on the "part-of-sequence" currently relevant to entity extraction. You also ensure that each part of your input sequence gets the model's full attention at some point during the output generation. Without guiding the attention in this way, you are playing attention roulette, where the odds of getting a win (win=extracting an entity correctly) decreases with increasing input sequence length (probably very drastically when you start to hit the limits of the number of tokens the model can pay attention to simultaneously). The last advantage is that because you are re-generating the input sequence, the model is forced to pay attention to the context around the entity that you are extracting. The entity is less likely to be extracted in isolation (avoiding situations where the model may have missed that the context modified the entity's category). The disadvantage is that if you extract entities in this way (and especially if the list of entities is long), the model might fall into a probability rut, using only a subset of the entities when it is extracting. @rmitsch, You probably know more about the above than me, so it would be interesting to hear your thoughts and where my reasoning might be flawed. For the solution I proposed, I am sure there are papers somewhere on this method, but I haven't found them. To be able to fairly assess the proposed method, the papers would have to use the latest SOTA open-source LLMs. I will continue searching for papers and let you know once I find something. |
Beta Was this translation helpful? Give feedback.
-
I hope this may be helpful, and if helpful a good starting point to construct a new NER framework when using LLMs. It could definitely do with some refinement. This is my first suggestion on GitHub ever, so please go easy on me.
This suggestion would:
Suppose we want to perform NER on the
text_to_label
sentence:For every generated token, we force the token to be either the start of our "{ENT~~}" sequence or the next token in our input text. Suppose each token is only one character; we can constrain the LLM's next output token to be the most likely option between: "Q" and "{". These constraints would likely improve NER quality, at the cost of generating more tokens (or in some cases, fewer tokens).
Beta Was this translation helpful? Give feedback.
All reactions