- contents -- entities -- signals -- coreference -- relationships -- general guidelines -- recipe -- histopath -- radiology -

- previous -- next -

Annotating text: a recipe

Introduction

It would be possible to read a document from start to end, marking all annotations in order, as they are found. This has not, however, been found to give the most accurate results. A more methodical approach is useful. This is given below. In order for annotation to be consistent, all annotators should follow this.

Once experienced at the task, annotators may find it quicker to interleave these steps. This is understandable. Annotators who do this should afterwards go over the whole document following the recipe below, in case anything has been missed by their more ad-hoc approach.

(Technical note: this recipe is written with the assumption that annotators are using the Knowtator tool. The steps are chosen bearing this in mind)

Summary

Step Sub-step
1. Read the document  
2. Mark the entities  
3. Mark the signals  
  3.1 add modifier relationships for each signal as you add the signal
4. Check for co-references  
  4.1 add any additional co-referring entities you might find, such as pronouns
5. Relationships etc.: for each entity in turn  
  5.1 Check entity spelling
  5.2 Check for relationships with other entities
6. Record additional information and time taken  

1. Read the whole document

Read the document through in its entirety, marking no annotations, to get an undestanding

2. Mark the entities

Read the document a second time, adding annotations for the mentions (including pronouns) of these basic entities, (in parallel if you find this easier):

Certain entities may suggest that others also exist. You should bear in mind the following:

3. Mark the signals

Now go through each of the conditions, loci, and interventions, checking for modifiers, qualifications, and associated text that signify further annotations (in parallel if you find this easier):

4. Co-reference

Now go through each of the mentions in turn, and check to see if it co-refers with any other mention. At the same time, check the text to see if you have missed any mentions that could be co-referred, in particular pronouns (things like "it", "this", "which").

  1. Create a co-reference annotation whenever you find two entities referring to the same thing, linking the coreference to the first mention of it.
  2. Make sure that pronouns have also been marked as mentions and co-referred.
  3. Add any additional entities that you spot, and co-refer them.

5. Relationships

Now go through each of the mentions in turn, and decide if any have relationships with other entities. At the same time, check the spelling of the mention.

Please take care not to over-annotate relationships. You do not need to hunt for every single possible relationship that you can deduce with clinical knowledge - only those that seem directly relevant to the section of text you are reading. Please see the general guidelines on annotating relationships for a discussion of this.

  1. Is the mention or signals misspelt? If so, record it as such.
  2. Consider the basic annotations and how they relate to others, adding relationships where they exist. In parallel:

6. Recording additional information

As you are annotating the document, record any comments that you feel are important. You may have to do this in some text file, or perhaps in the annotation tool itself. For example, record: