White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

A corpus based approach to generalise a chatbot system

Abu Shawar, Bayan Aref (2005) A corpus based approach to generalise a chatbot system. PhD thesis, University of Leeds.


Download (2195Kb)


Chatbot tools are computer programs which interact with users using natural languages. Most developers built their systems aiming to fool users that they are talking with real humans. Up to now most chatbots serve as a tool to amuse users through chatting with a robot. However, the knowledge bases of almost all chatbots are edited manually which restricts users to specific languages and domains. This thesis shows that chatbot technology could be used in many different ways in addition to being a tool for fun. A chatbot could be used as a tool to learn or study a new language, a tool to access an information system, a tool to visualise the contents of a corpus and a tool to give answers to questions in a specific domain. Instead of being restricted to a specific domain or written language, a chatbot could be trained with any text in any language. Some of the differences between real human conversations and human-chatbot dialogues are presented. A Java program has been developed to read a text from a machine readaatbble text (corpus) and convert it to ALICE chatbot format language (AIML). The program was built to be general, the generality in this respect implies no restrictions on specific language, domain or structure. Different languages were tested: English, Arabic, Afrikaans, French and Spanish. At the same time different corpora structures were used: dialogue, monologue and structured text.

Item Type: Thesis (PhD)
Additional Information: Supplied directly by the School of Computing, University of Leeds.
Academic Units: The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Identification Number/EthosID: uk.bl.ethos.529697
Depositing User: Dr L G Proll
Date Deposited: 08 Mar 2011 10:14
Last Modified: 07 Mar 2014 11:23
URI: http://etheses.whiterose.ac.uk/id/eprint/1323

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)