White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding

Al Zadjali, Hanaa (2017) Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding. PhD thesis, University of Sheffield.

[img]
Preview
Text (PhD thesis)
Last version after viva MyThesis (Hanaa AlZadjali).pdf
Available under License Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 UK: England & Wales.

Download (5Mb) | Preview

Abstract

The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals. This thesis considers the compression of XML labels. A novel XML labelling scheme, named “Base-9”, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9’s XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named “Elias-Fibonacci of order 3”, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage. Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein.

Item Type: Thesis (PhD)
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Computer Science (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > Computer Science (Sheffield)
Identification Number/EthosID: uk.bl.ethos.727296
Depositing User: Mrs Hanaa Al Zadjali
Date Deposited: 20 Nov 2017 08:56
Last Modified: 12 Oct 2018 09:47
URI: http://etheses.whiterose.ac.uk/id/eprint/18571

You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Actions (repository staff only: login required)