- contents -- entities -- signals -- coreference -- relationships -- general guidelines -- recipe -- histopath -- radiology -
- previous -- next -
Annotating entities in text
The basic annotation unit within the CLEF corpus is the entity. Entities refer to real-world objects that are a part of a patient's care and treatment: conditions, drugs, investigations etc. Entities are grounded in the text of CLEF documents. The span of text that refers to an entity is a mention of that entity. An entity may appear several times in the same document. Different mentions may refer to the same entity: "Mr. Jone's tumour... his melanoma... the lump".
Just as the entity is the basic unit of annotation, so marking up entities and mentions is the basic sub-task of the annotation process. In this sub-task, stretches of text are marked as being mentions of an entity of a particular type. A co-reference link may be created between these mentions. This section describes, for each of the entity types, how annotators should map from the surface text to annotation:
- Which bits of text should be annotated?
- How should spans of text be mapped to mentions: which text should be included and excluded?
- How should special cases be dealt with?
- What information should be recorded for different entities?
Summary
The entities