- contents -- entities -- signals -- coreference -- relationships -- general guidelines -- recipe -- histopath -- radiology -

- previous -- next -

Annotating text: general guidelines

Introduction

For our purpose, annotating text is the process of marking stretches, or spans, of text in some way, signifying that the span of text has particular semantics. Typically, an annotator will carry out this process with some tool. The tool will be used to associate annotations with bits of text, describing the semantics of those spans. In addition to annotations being associated directly with a span, other annotations may be added that describe relationships between bits of text. Annotation is about the text: what appears in it and what it means. It is not about building an abstract model of the text: it is grounded in the document itself.

In CLEF, the annotation process can be split into four sub-tasks:

These guidelines describe how annotators should map from the surface text to annotation:

This section gives some general guidelines for annotating text. This is followed by specific guidelines for each entity, signal, and relationship type.

Collaboration between annotators

Annotate words, not concepts

If something appears too complex to annotate, or you are unsure ...

Don't base annotation on your own view of what should be in a medical record

Overlapping and containment of annotations

Breaking down phrases

Implied entities

Relationships and domain knowledge

Signals: modifying entities

Signals are additional words that modify an entity, to provide extra information about it. For eample, "_left_ leg", "_no_ meatstases", "_upper_ back". Signals always modify an entity that is closely associated with them. Signals are related to their main entity with a "modifies" relationship. So we might create annotations that say "left modifies leg". The modifies relationship is not like other relationships. It is saying something about the linguistic structure of a phrase, and is much less about clinical (domain) knowledge than other relationships.

Metonymy

Cross-document inference

Plurals, conjunctions and sets

Spelling and other mistakes