Dobosz, Rafal

An Investigation of the Impact of Big Data on Bioinformatics Software

Type:
Names:
Creator (cre): Dobosz, Rafal, Thesis advisor (ths): McConnell, Sabine, Thesis advisor (ths): Hurley, Richard, Degree committee member (dgc): McConnell, Sabine, Degree committee member (dgc): Hurley, Richard, Degree committee member (dgc): Hajibabaei, Mehrdad, Degree committee member (dgc): Cater, Bruce, Degree granting institution (dgg): Trent University
Abstract:

As the generation of genetic data accelerates, Big Data has an increasing impact on the way bioinformatics software is used. The experiments become larger and more complex than originally envisioned by software designers. One way to deal with this problem is to use parallel computing.

Using the program Structure as a case study, we investigate ways in which to counteract the challenges created by the growing datasets. We propose an OpenMP and an OpenMP-MPI hybrid parallelization of the MCMC steps, and analyse the performance in various scenarios.

The results indicate that the parallelizations produce significant speedups over the serial version in all scenarios tested. This allows for using the available hardware more efficiently, by adapting the program to the parallel architecture. This is important because not only does it reduce the time required to perform existing analyses, but it also opens the door to new analyses, which were previously impractical.

Author Keywords: Big Data, HPC, MCMC, parallelization, speedup, Structure

2014