Combinatorial Collisions in Database Matching: With Examples from DNA

Abstract

Databases containing information such as location points, web searches and fi- nancial transactions are becoming the new normal as technology advances. Conse- quentially, searches and cross-referencing in big data are becoming a common prob- lem as computing and statistical analysis increasingly allow for the contents of such databases to be analyzed and dredged for data. Searches through big data are fre- quently done without a hypothesis formulated before hand, and as these databases grow and become more complex, the room for error also increases. Regardless of how these searches are framed, the data they collect may lead to false convictions. DNA databases may be of particular interest, since DNA is often viewed as significant evi- dence, however, such evidence is sometimes not interpreted in a proper manner in the court room. In this thesis, we present and validate a framework for investigating var- ious collisions within databases using Monte Carlo Simulations, with examples from DNA. We also discuss how DNA evidence may be wrongly portrayed in the court room, and the explanation behind this. We then outline the problem which may occur when numerous types of databases are searched for suspects, and framework to address these problems.

Author Keywords: big data analysis, collisions, database searches, DNA databases, monte carlo simulation

    Item Description
    Type
    Contributors
    Creator (cre): Johnson, Stephanie
    Thesis advisor (ths): Pollanen, Marco
    Thesis advisor (ths): Burr, Wesley
    Degree granting institution (dgg): Trent University
    Date Issued
    2020
    Date (Unspecified)
    2020
    Place Published
    Peterborough, ON
    Language
    Extent
    122 pages
    Rights
    Copyright is held by the author, with all rights reserved, unless otherwise noted.
    Subject (Topical)
    Local Identifier
    TC-OPET-10827
    Publisher
    Trent University
    Degree