- contents -- entities -- signals -- coreference -- relationships -- general guidelines -- recipe -- histopath -- radiology -
- previous -- next -
Annotating co-reference in text
Introduction
- Co-reference is a phenomenon in language where two words refer to exactly the same thing in the world.
- For the purposes of discussion, we will distinguish between two types of co-reference:
- Pronominal co-reference: a pronoun co-refers with something earlier in the text.
- For example,
- "He has a melanoma. It is in his second toe".
- The pronoun "it" co-refers with "melanoma". They are exactly the same thing.
- Lexical co-reference: a lexical item from an open word class, such as a noun, refers to something earlier in the text.
- For example,
- "He has a melanoma. The tumour is in his 2nd toe."
- The noun "tumour" co-refers with "melanoma". They are exactly the same thing.
- For every mention of an entity in text, annotators should record all of its co-references.
Pronominal co-reference
For the purposes of pronominal co-reference, annotators should consider the following non-exhaustive list of pronouns that are commonly used when referring to entities in the CLEF texts:
Type of pronoun | Examples |
Definite | it, they, them |
Demonstrative | this, that, these, those |
Interrogative | which, whose, what |
Possesive | whose, their |
Lexical co-reference
- Lexical co-reference is between two words for the same thing. Commonly, this coreference is between:
- Synonyms
- For example,
- "Haemoglobin was 7.5g/dl. Given the Hb, further treatment was postponed until after transfusion"
- "Haemoglobin" and "Hb" are synonymous. The two words will be co-referred.
- A specific word and a more general form (hypernyms)
- For example,
- "There was a mass in his 2nd toe. The digit was excised."
- "digit" is referring to the same thing as "toe", in a general sense. The two words will be co-referred
Using domain knowledge
- Co-reference should be annotated regardless of any domain knowledge needed to interpret the co-reference
- Co-reference that depends on an understanding of the domain will be annotated.
- Lexical co-reference in particular often requires more domain knowledge to understand. Such co-references are not always obvious to the non-expert, although they may be guessed from clues in the text.
- Here are some examples of lexical and pronominal co-reference to illustrate the use of domain knowledge:
- "He has a melanoma. The tumour is in his 2nd toe."
- implies a co-reference between melanoma and tumour
- understanding the co-reference requires knowledge that a melanoma is a kind of tumour
- the co-reference will be annotated
- "He has a melanoma. It is in his 2nd toe."
- implies a co-reference between melanoma and it.
- understanding the co-reference requires no domain knowledge.
- the co-reference will be annotated.
- "X-ray showed a mass. It was excised."
- there are two possible co-references:
- "X-ray" and "it"
- "mass" and "it"
- The co-reference will be annotated.
- "X-ray showed a mass in the left lobe. It was excised."
- there are three possible co-references.
- "X-ray" and "it"
- "mass" and "it"
- "left lobe" and "it"
- The co-reference will be annotated.
- Some entities are inherently co-referential. For example, a patient has one abdomen. Two abdomen mentions in the text will most likely refer to the same entity. The resolution of this coreference also requires domain knowledge. It will be marked.
- In other cases, co-reference depends on the meaning of the text. For example, two mentions of an x-ray in a text may or may not refer to the same investigation. The co-reference will be marked if it can be inferred.
Co-reference and conjunctions
Sometimes, a single word might refer back to several things in a previous sentence. Co-reference should not be annotated in this case.
- For example,
- "He is suffering from mild headaches and from back pain. These are being treated with ibuprofen."
- The pronoun "these" is referring to "headaches and back pain".
- However, we do not mark a single "headaches and back pain" Condition in the document. We mark two Conditions.
- We have no way to deal with a single co-reference to two things like this in the annotation tool at the moment
- The co-reference should therefore not be created.
Co-reference and sets
Sometimes, a plural or a set of things (e.g. a patient's limb) will be mentioned, and then a little later, a single member of that set (e.g. their left leg). The two should not be coreferred. A single thing in the world is not the same as a set that contains that thing: your left leg is not the same as your four limbs.
- For example
- "Her finger nails show onycholysis. The nail of the left index is bleeding from the bed"
- In such cases, the set (finger "nails"), and the indivudual ("nail" of the left index), should be annotated as Loci.
- They should not, however, be coreferred.
- Drugs and their classes give similar examples. For example,
- "We will start empirical antibiotic therapy today. He will take Flucloxacillin and Metronidazole"
- "antibiotic" should not be coreferred to either "Flucloxacillin" or "Metronidazole"
- "antibiotic" is referring to both of them together, and possibly to other antibiotics as well. It is not the same as either one of them.
Be aware of relationships that are not co-reference
- Two entities in the text must be clearly referring to the same thing in the real world to be coreferring. The fact that one thing implies another or causes another is not enough for them to corefer. They must be the same.
- For example
- "An opacity was seen compatible with gallstones"
- The opacity is on a film. The gallstones are in a body. They are not the same thing. They should not be coreferred
- "A wall thickening consistent with acute colitis"
- The wall thickening implies colitis, but on its own, it is not colitis. They should not be coreferred
- The fact that one thing is a part of another will not be marked as a coreference
- For example
- "He has a mass in his lung, in the left lobe"
- "lobe" and "lung" will not be co-referred.
Scope of co-reference
- Co-reference will be annotated where the referents are in the same document.
- Co-referents may be in the same, or in different sentences