White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Computer methods for identifying significant features in protein sequences.

Perkins, David (1994) Computer methods for identifying significant features in protein sequences. PhD thesis, University of Leeds.

[img] Text (599924.pdf)
599924.pdf - Final eThesis - complete (pdf)

Download (18Mb)

Abstract

The research described in this thesis can be easily and conveniently separated under two broad headings. the definition of discriminating motif sets for protein families and software development In this instance the phrase motif set refers to a combination of features in the amino acid sequences of a family of proteins that is diagnostic of family membership and therefore has predictive value in identifying new family members. Under the first heading. a number of sets of motifs are described in detail while a number of others are included as an appendix in a format compatible with the PRINTS motif database. All these studies involved the multiple alignment of protein sequences extracted from the database and the use of database scanning techniques. From these motif sets it has been possible to identify new members of protein families and they may also supply valuable information for the exploration of the possible function and structure of the protein families. A number of sequence analysis software packages are also described. They include both novel software and also the reworking of old algorithms with additions to make them more efficient. more useful for modem requirements and to fix existing problems. In the former category. new sequence alignment programs have been developed which integrate structural information (if any is available) with sequence and physicochemical properties. A number of programs are also discussed that allow the display and manipulation of a variety of sequence parameters. such as hydropathy and positional variability. which are very useful tools for motif definition. All these programs are written in C and the majority make use of the XlMotif programming libraries. where appropriate and are available on a variety of different hardware platforms. The ADSP system has also been rewritten to make it more efficient and it has been ported to the UNIX operating system to make it more accessible to a larger number of users.

Item Type: Thesis (PhD)
Academic Units: The University of Leeds > Faculty of Biological Sciences (Leeds)
Other academic unit: Department of Biochemistry and Molecular Biology
Identification Number/EthosID: uk.bl.ethos.599924
Depositing User: Ethos Import
Date Deposited: 10 Nov 2014 13:28
Last Modified: 23 Apr 2015 11:35
URI: http://etheses.whiterose.ac.uk/id/eprint/6772

Actions (repository staff only: login required)