White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

A performance analysis of a hybrid relational-XML approach to store partially-structured data

Abdel Kader, Yasser (2007) A performance analysis of a hybrid relational-XML approach to store partially-structured data. PhD thesis, University of Sheffield.

[img]
Preview
Text (440895.pdf)
440895.pdf

Download (39Mb)

Abstract

Nowadays, huge amounts of data are stored outside the rigid boundary of highly- structured and traditional database management systems, such as World Wide Web, application data that deals with non-standard data formats, legacy systems and structured documents. On the one hand, this data does not conform to a pre-defined structure and yet it is not completely un-structured. This data is classified as semi- structured data. There is a need to store and manage the large existing collections of semi-structured data and to query it efficiently in a way similar to traditional databases. But as yet, a mature technology for doing so does not exist. However, eXtensible Markup Language (XML) has emerged as the lingua franca of the web. XML has the ability to represent all form of structured data (highly-, semi- and un-structured). This research aims to enhance the performance of storing, querying and retrieving XML data that contain a combination of highly-structured and semi-structured data (this hybrid structuring can be described as partially-structured data), so as to better support classes of application where there is a fixed formal framework for data, but also an ad hoc component. One way to manage XML data is by using relational database management systems. This is based on the robust, well established and optimised performance relational database management systems can offer. The research presented in this thesis is concerned with seeking ways of further exploiting the latter advantages in adapting relational technology to store XML data. To this end, the research has proposed a hybrid relational-XML storage model to store partially-structured XML encoded data, in which a combination of structure mapping and XML types are used within a relational database management system, so as to exploit pre-knowledge of the highly-structured part in query processing while allowing flexibility to store the semi-structured part. A set of experiments were designed to evaluate the query performance for partially-structured data using structure mapping to relational tables, XML types and the hybrid model. These experiments were evaluated using a standard benchmark set of queries. The analyses of the experiments' results establish the impact on query performance as structuredness, volume and query characteristics change. The results of the experiments showed that there was no one storage model that outperforms all other models in all cases. In most of the cases, this hybrid model performed better than both the relational and XML data type models. The research proposed a method, by which the results of the performance analysis can be utilised by the database designer to seek optimal relational storage models for XML-encoded partially-structured data.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Depositing User: EThOS Import Sheffield
Date Deposited: 24 Apr 2013 13:09
Last Modified: 08 Aug 2013 08:52
URI: http://etheses.whiterose.ac.uk/id/eprint/3600

Actions (repository staff only: login required)