McConnell, Sabine
Using environmental DNA (eDNA) metabarcoding to assess aquatic plant communities
Environmental DNA (eDNA) metabarcoding targets sequences with interspecific
variation that can be amplified using universal primers allowing simultaneous detection
of multiple species from environmental samples. I developed novel primers for three
barcodes commonly used to identify plant species, and compared amplification success
for aquatic plant DNA against pre-existing primers. Control eDNA samples of 45 plant
species showed that species-level identification was highest for novel matK and preexisting
ITS2 primers (42% each); remaining primers each identified between 24% and
33% of species. Novel matK, rbcL, and pre-existing ITS2 primers combined identified
88% of aquatic species. The novel matK primers identified the largest number of species
from eDNA collected from the Black River, Ontario; 21 aquatic plant species were
identified using all primers. This study showed that eDNA metabarcoding allows for
simultaneous detection of aquatic plants including invasive species and species-at-risk,
thereby providing a biodiversity assessment tool with a variety of applications.
Author Keywords: aquatic plants, biodiversity, bioinformatics, environmental DNA (eDNA), high-throughput sequencing, metabarcoding
A wind tunnel based investigation of three-dimensional grain scale saltation and boundary-layer stress partitioning using Particle Tracking Velocimetry
Aeolian transport of sand particles is an important geomorphic process that occurs over a significant portion of the earth's land surface. Wind tunnel simulations have been used for more than 75 years to advance the understanding of this process; however, there are still several principles that lack validation from direct sampling of the sand particles in flight. Neither the three-dimensional dispersion of, nor the momentum carried by particles in flight have been properly measured. This has resulted in the inability to validate numerical particle dispersion models and the key boundary-layer momentum partitioning model that serves as the framework for understanding the air-sand feedback loop. The primary impediment to these measurements being made is a lack of tools suited for the task. To this end, this PhD aims to improve existing particle tracking technology, thus enabling the collection of particle measurements during wind tunnel experiments that would address the aforementioned knowledge gaps.
Through the design and implementation of the Expected Particle Area Searching method, a fully automated particle tracking velocimetry system was developed with the capability to measure within ½ grain diameter of the bed surface under steady state transport conditions. This tool was used to collect the first 3-D data set of particle trajectories, from which it was determined that a mere 1/8th of sand transport is stream aligned and 95% is contained within ± 45o of the mean wind direction. Particles travelling at increasing spanwise angles relative to the stream aligned flow were found to exhibit different impact and ejection velocities and angles. The decrease in the number of particles with increasing height in the saltation cloud, very close to the bed is observed to transition from a power to a linear relation, in contrast to previous literature that observed an exponential decay with coarser vertical resolution.
The first direct measurements of particle-borne stress were captured over a range of wind velocities and were compared with earlier fluid stress measurements taken using Laser Doppler Anemometry. In support of established saltation theory, impacting particle momentum is found to contribute strongly to particle entrainment under equilibrium conditions. In opposition to established theory, however, particle-borne stress was found to reach a maximum above the surface and does not match the change in air-borne stress with increasing distance from the surface. Near surface splashed particles, measured herein for the first time, appear to play a greater role in stress partitioning than previously thought. This study suggests that research is needed to investigate the role of bed load transport on stress partitioning, to differentiate between airborne trajectory types, and to develop particle tracking tools for field conditions.
Author Keywords: Aeolian Transport, Eolian Transport, Particle Tracking Velocimetry, Saltation, Stress Partitioning, Wind Tunnel Simulation
Exploring the Scalability of Deep Learning on GPU Clusters
In recent years, we have observed an unprecedented rise in popularity of AI-powered systems. They have become ubiquitous in modern life, being used by countless people every day. Many of these AI systems are powered, entirely or partially, by deep learning models. From language translation to image recognition, deep learning models are being used to build systems with unprecedented accuracy. The primary downside, is the significant time required to train the models. Fortunately, the time needed for training the models is reduced through the use of GPUs rather than CPUs. However, with model complexity ever increasing, training times even with GPUs are on the rise. One possible solution to ever-increasing training times is to use parallelization to enable the distributed training of models on GPU clusters. This thesis investigates how to utilise clusters of GPU-accelerated nodes to achieve the best scalability possible, thus minimising model training times.
Author Keywords: Compute Canada, Deep Learning, Distributed Computing, Horovod, Parallel Computing, TensorFlow
Utilizing Class-Specific Thresholds Discovered by Outlier Detection
We investigated if the performance of selected supervised machine-learning techniques could be improved by combining univariate outlier-detection techniques and machine-learning methods. We developed a framework to discover class-specific thresholds in class probability estimates using univariate outlier detection and proposed two novel techniques to utilize these class-specific thresholds. These proposed techniques were applied to various data sets and the results were evaluated. Our experimental results suggest that some of our techniques may improve recall in the base learner. Additional results suggest that one technique may produce higher accuracy and precision than AdaBoost.M1, while another may produce higher recall. Finally, our results suggest that we can achieve higher accuracy, precision, or recall when AdaBoost.M1 fails to produce higher metric values than the base learner.
Author Keywords: AdaBoost, Boosting, Classification, Class-Specific Thresholds, Machine Learning, Outliers
Machine Learning Using Topology Signatures For Associative Memory
This thesis presents a technique to produce signatures from topologies generated by the Growing Neural Gas algorithm. The generated signatures have the following characteristics: The signature's memory footprint is smaller than the "real object" and it represents a point in the n x m multidimensional space. Signatures can be compared based on Euclidean distance and distances between signatures provide measurements of differences between models. Signatures can be associated with a concept and then be used as a learning step for a classification algorithm. The signatures are normalized and vectorized to be used in a multidimensional space clustering. Although the technique is generic in essence, it was tested by classifying alphabet and numerical handwritten characters and 2D figures obtaining a good accuracy and precision. It can be used for many other purposes related to shapes and abstract typologies classification and associative memory. Future work could incorporate other classifiers.
Author Keywords: Associative memory, Character recognition, Machine learning, Neural gas, Topological signatures, Unsupervised learning
Size and fluorescence properties of allochthonous dissolved organic matter: characterization, transformations, and reactivity
Dissolved organic matter (DOM) is a mixture of molecules with dynamic structure and composition that are ubiquitous in aquatic systems. DOM has several important functions in both natural and engineered systems, such as supporting microorganisms, governing the toxicity of metals and other pollutants, and controlling the fate of dissolved carbon. The structure and composition of DOM determine its reactivity, and hence its effectiveness in these ecosystem functions.
While the structure, composition, and reactivity of riverine and marine DOM have been previously investigated, those of allochthonous DOM collected prior to exposure to microbes and sunlight have received scant attention. The following dissertation constitutes the first in-depth study of the structure, composition, and reactivity of allochthonous DOM at its point of origin (i.e. leaf leachates, LLDOM), as detected by measuring its size and optical properties. Concomitantly, novel chemometric methods were developed to interpret size-resolved data obtained using asymmetrical flow field-flow fractionation, including spectral deconvolution and the application of machine learning algorithms such as self-organizing maps to fluorescence data using a dataset of more than 1000 fluorescence excitation-emission matrices.
The size and fluorescence properties of LLDOM are highly distinct. Indeed, LLDOM was correctly classified as one of 13 species/sources with 92.5% accuracy based on its fluorescence composition, and LLDOM was distinguished from riverine DOM sampled from eight different rivers with 98.3% accuracy. Additionally, both fluorescence and size properties were effective conservative tracers of DOC contribution in pH-controlled mixtures of leaf leachates and riverine DOM over two weeks. However, the structure of LLDOM responded differently to pH changes for leaves/needles from different tree species, and for older needles. Structural changes were non-reversible.
Copper-binding strength (log K) differed for the different fluorescent components of DOM in a single allochthonous source by more than an order of magnitude (4.73 compared to 6.11). Biotransformation preferentially removed protein/polyphenol-like fluorescence and altered copper-binding parameters: log K increased from 4.7 to 5.5 for one fluorescent component measured by fluorescence quenching, but decreased from 7.2 to 5.8 for the overall DOM, as measured using voltammetry. The complexing capacity of DOM increased in response to biotransformation for both fluorescent and total DOM. The relationship between fluorescence and size properties was consistent for fresh allochthonous DOM, but differed in aged material.
Since the size and fluorescence properties of LLDOM are strikingly different from those of riverine DOM, deeper investigation into transformative pathways and mixing processes is required to elucidate the contribution of riparian plant species to DOM signatures in rivers.
Author Keywords: Analytical chemistry, Chemometrics, Dissolved organic matter (DOM), Field-flow fractionation, Fluorescence spectroscopy, Parallel factor analysis (PARAFAC)
Self-Organizing Maps and Galaxy Evolution
Artificial Neural Networks (ANN) have been applied to many areas of research. These techniques use a series of object attributes and can be trained to recognize different classes of objects. The Self-Organizing Map (SOM) is an unsupervised machine learning technique which has been shown to be successful in the mapping of high-dimensional data into a 2D representation referred to as a map. These maps are easier to interpret and aid in the classification of data. In this work, the existing algorithms for the SOM have been extended to generate 3D maps. The higher dimensionality of the map provides for more information to be made available to the interpretation of classifications. The effectiveness of the implementation was verified using three separate standard datasets. Results from these investigations supported the expectation that a 3D SOM would result in a more effective classifier.
The 3D SOM algorithm was then applied to an analysis of galaxy morphology classifications. It is postulated that the morphology of a galaxy relates directly to how it will evolve over time. In this work, the Spectral Energy Distribution (SED) will be used as a source for galaxy attributes. The SED data was extracted from the NASA Extragalactic Database (NED). The data was grouped into sample sets of matching frequencies and the 3D SOM application was applied as a morphological classifier. It was shown that the SOMs created were effective as an unsupervised machine learning technique to classify galaxies based solely on their SED. Morphological predictions for a number of galaxies were shown to be in agreement with classifications obtained from new observations in NED.
Author Keywords: Galaxy Morphology, Multi-wavelength, parallel, Self-Organizing Maps