Fahey, Polly Victoria ORCID: https://orcid.org/0000-0002-3781-5313 (2021) Sublinear Time Evaluation of Logically Defined Queries on Databases of Bounded Degree. PhD thesis, University of Leeds.
Abstract
Currently, there is an ever-growing need for extremely efficient algorithms to extract information from data. However, when the input data is huge, even reading the whole input once is too costly and hence many algorithms that are traditionally classified as 'efficient' (i.e. run in polynomial time) become impractical. This motivates the study of algorithms that run in sublinear time. To achieve sublinear time, it is unavoidable that we must sacrifice some accuracy (since we cannot read the whole input), but in view of many practical applications it is crucial that we provide guarantees on the accuracy of the output. In this thesis, we aim at providing sublinear time algorithms for the approximate evaluation of queries on relational databases of bounded degree, that come with probabilistic accuracy guarantees. We mainly focus on queries definable in first-order logic and monadic second-order logic with counting.
For boolean queries, we use the framework of property testing. In property testing, for a property P, the goal is to distinguish inputs that have P from those that are far from having P with high probability correctly, by querying only a small number of local parts of the input. Much research has focussed on the query complexity of property testing algorithms, i.e. the number of queries the algorithm makes to the input, but in view of applications, the running time of the algorithm is equally relevant. In the bounded degree relational database model, which was introduced in (Adler, Harwath STACS 2018) and is a natural extension of the bounded degree graph model, the distance to P is measured by the number of tuple modifications (additions or deletions), that are necessary to transform a database into one with property P. We introduce a new model, which is based on the bounded degree relational database model, but the distance measure allows both tuple modifications and element modifications. We begin an investigation of which conditions allow constant time property testing algorithms in our new model.
In particular, we show that on databases of bounded degree and bounded tree-width, all boolean queries expressible in monadic second-order logic with counting are testable in constant time.
On the way to proving our results in the new model we also partially answer an open problem by Alon. Alon (Proposition 19.10 in: Lovász, Large networks and graph limits, 2012) proved that for every bounded degree graph G there exists a constant size graph H that has a similar neighbourhood distribution to G. This proof is only existential and it was suggested as an open problem to find explicit bounds on the size of H. We find bounds on the size of H for two special cases.
Furthermore, we study query enumeration of non-boolean queries. In the query enumeration problem, after a preprocessing phase the aim is to enumerate all answers to the query with only a small delay between any two consecutive answers. We introduce a new model for the approximate query enumeration on classes of relational databases of bounded degree. We aim for sublinear time preprocessing and constant delay. Since sublinear running time does not allow reading the whole input database even once, sacrificing some accuracy is inevitable for our speed-up. Nevertheless, our enumeration algorithms come with guarantees: With high probability, (1) only tuples are enumerated that are answers to the query or `close' to being answers to the query, and (2) if the proportion of tuples that are answers to the query is sufficiently large, then all answers will be enumerated. Here the notion of `closeness' is a tuple edit distance in the input database.
We identify conditions under which first-order definable queries can be enumerated approximately with constant delay after a sublinear preprocessing phase. In particular, we show that on databases of bounded degree and bounded tree-width, queries expressible in first-order logic can be approximately enumerated with constant delay after a sublinear preprocessing phase.
Metadata
Supervisors: | Adler, Isolde |
---|---|
Keywords: | Property Testing, First-order Logic, Query Enumeration, Sublinear Time Algorithms |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
Identification Number/EthosID: | uk.bl.ethos.842718 |
Depositing User: | Miss Polly Victoria Fahey |
Date Deposited: | 03 Dec 2021 11:02 |
Last Modified: | 11 Dec 2022 10:53 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:29690 |
Download
Final eThesis - complete (pdf)
Filename: Fahey_P_Computing_PhD_2021.pdf
Licence:
This work is licensed under a Creative Commons Attribution NonCommercial ShareAlike 4.0 International License
Export
Statistics
You do not need to contact us to get a copy of this thesis. Please use the 'Download' link(s) above to get a copy.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.