An Investigation of the Impact of Big Data on Bioinformatics Software

Abstract

As the generation of genetic data accelerates, Big Data has an increasing impact on the way bioinformatics software is used. The experiments become larger and more complex than originally envisioned by software designers. One way to deal with this problem is to use parallel computing.

Using the program Structure as a case study, we investigate ways in which to counteract the challenges created by the growing datasets. We propose an OpenMP and an OpenMP-MPI hybrid parallelization of the MCMC steps, and analyse the performance in various scenarios.

The results indicate that the parallelizations produce significant speedups over the serial version in all scenarios tested. This allows for using the available hardware more efficiently, by adapting the program to the parallel architecture. This is important because not only does it reduce the time required to perform existing analyses, but it also opens the door to new analyses, which were previously impractical.

Author Keywords: Big Data, HPC, MCMC, parallelization, speedup, Structure

    Item Description
    Type
    Contributors
    Creator (cre): Dobosz, Rafal
    Thesis advisor (ths): McConnell, Sabine
    Thesis advisor (ths): Hurley, Richard
    Degree committee member (dgc): McConnell, Sabine
    Degree committee member (dgc): Hurley, Richard
    Degree committee member (dgc): Hajibabaei, Mehrdad
    Degree committee member (dgc): Cater, Bruce
    Degree granting institution (dgg): Trent University
    Date Issued
    2014
    Date (Unspecified)
    2014
    Place Published
    Peterborough, ON
    Language
    Extent
    126 pages
    Rights
    Copyright is held by the author, with all rights reserved, unless otherwise noted.
    Subject (Topical)
    Local Identifier
    TC-OPET-10060
    Publisher
    Trent University
    Degree