- contents -- entities -- signals -- coreference -- relationships -- general guidelines -- recipe -- histopath -- radiology -

- previous -- next -

Terminology

This section describes the terms used to discuss annotations. Although some annotators will be familiar with the CLEF project and its language, others may have no background in natural language processing, ontologies, or any of our other disciplines. In addition, CLEF project partners have their own terminologies for their own work. These often conflict, and can cause confusion. For the purposes of annotating the CLEF gold standard, a single terminology will be adopted.

(Technical note: as the annotations are primarily for use in an information extraction gold standard, the terminology will be based on that used in information extraction. Specifically, it will be based on the terminology that has evolved in the MUC, ACE, and TIMEX annotation exercises and evaluations.)

An example

The terminology used is described below. It makes use of the following example:

Annotation terminology

Term Description Example
Entity An entity is a thing in the world. It has an existence independent of the text: it is not a piece of text. It may be concrete or abstract. Explicit entites are mentioned in the text. Implicit entites are not mentioned, but their existence may be inferred from the text. We are not interested in annotating implied entities, only those that are explicitly mentioned in the text. In the example, the piece of real-world tissue that is Mr. Jones's melanoma is an explicit entity. So is his left second toe. But the bit of flesh and bone that is Mr. Jones's left foot is an implied entity - it is not mentioned in the text, although we can guess that it exists. We are only interested in the things in the text: the mealnaoma and the toe, not the foot.
Mention A mention is the textual realisation of an entity. A single explicit entity may have more than one mention. An implicit entity has no mentions. In the example, the surface language strings "melanoma" and "it" are both mentions of the entity that is the real-world lump Mr. Jones's melanoma
Signal A signal is a piece of text that provides extra information about an entity. It may modify it, providing a value for some attribute of the entity. In the example, "left" signals something about the toe: its laterality attribute. Also, "no" signals something about the secondaries entity: that it does not exist.
Reference Mentions refer to entities. They provide references to entities. The reference is the relation between the mention and the entity. As with mention, "melanoma" and "it" both provide references to the entity that is Mr. Jones's melanoma
Co-reference When two or more mentions refer to the same entity, they corefer. "melanoma" and "it" corefer to the entity that is Mr. Jones's melanoma
Type A type is a categorisation of an entity or signal. Each entity or signal will have a type. In the example, we may have types of PERSON, DISEASE and BODY-PART. The entity that is Mr. Jones's toe has a type of BODY-PART. Mr. Jones has a type of PERSON.
Relationship A relationship exists between two entites. It describes some interaction between those two entities in the world. Like entities, relationships have a type. Mr. Jones's melanoma is located in his toe. We say that there is a relationship between the melanoma and toe. It could have a type of LOCATION.
Modifier relationship A modifier relationship exists between every signal and the entity that it provides information about. The modifier relationship provides the entity with some attributional property. Typically, this will have a value selected from a limited set of possible values. In the example, the left signal modifies the toe entity. It gives it an attribute with a vlaue, such as laterality=left.
Argument The entities and signals that are related by a relationship are called its arguments. In the previous example, the melanoma and toe entities are arguments to the LOCATION relationship.