Statistics

Machine Learning for Aviation Data

Type:
Names:
Creator (cre): Meng, Yang, Thesis advisor (ths): McConell, Sabine, Thesis advisor (ths): Hurley, Richard, Degree granting institution (dgg): Trent University
Abstract:

This thesis is part of an industry project which collaborates with an aviation technology company on pilot performance assessment. In this project, we propose utilizing the pilots' training data to develop a model that can recognize the pilots' activity patterns for evaluation. The data will present as a time series, representing a pilot's actions during maneuvers. In this thesis, the main contribution is focusing on a multivariate time series dataset, including preprocessing and transformation. The main difficulties in time series classification is the data sequence of the time dimension. In this thesis, I developed an algorithm which formats time series data into equal length data.

Three classification and two transformation methods were used. In total, there are six models for comparison. The initial accuracy was 40%. By optimization through resampling, we increased the accuracy to 60%.

Author Keywords: Data Mining, K-NN, Machine Learning, Multivariate Time Series Classification, Time Series Forest

2022

Particulate Matter Component Analyses in Relation to Public Health in Canada

Type:
Names:
Creator (cre): Jarvis, Shannon Margaret, Thesis advisor (ths): Burr, Wesley S, Thesis advisor (ths): Shin, Hwashin H, Degree committee member (dgc): Newlands, Nathaniel, Degree granting institution (dgg): Trent University
Abstract:

This thesis explores the shot-term relationship between exposure to ambient air pollution and human health through metrics such as mortality and hospitalization in Canada. We begin by detailing the organization and interpolation of air pollution data from its partially quality-controlled source form. Analyses of seasonal, regional and temporal trends of all major components of PM2.5, was performed, showing a seasonal variation across most regions and validating the dataset.

A one-pollutant statistical Generalized Additive Model was applied to the data, estimating the health risk associated with exposure to thirteen different components of PM2.5. The selected components were based on those that compromised the majority of the mass and included: sulphate, nitrate, zinc, silicon, iron, nickel, vanadium, potassium, organic carbon, organic matter, elemental carbon, total carbon. Trends based on annual estimates of the association for PM2.5, and its constituents,were compared, showing that carbonaceous compounds, sulphate and nitrate had similar estimates of association. Many estimates, as is common in population ecologic epidemiology, had association estimates statistically indistinguishable from zero, but with clear features of interest, including evident differences between cold and warm season associations in Canada's temperate climate.

A method to model two correlated pollutants (in this case, PM2.5 and O3) was developed using thin plate splines. In this approach, the location of the response surface (after accounting for the temperature, a smooth function of time and day of week) that corresponds to the average pollutant concentration and the average plus one unit was used as the estimate of the joint contribution of pollutants due to a unit increase. The estimates from the thin plate spline (TPS) approach were compared to the single pollutant models, with large increases and decreases in PM2.5 and O3 being captured in the TPS estimates. However, this approach indicated significantly larger error in the estimates than would be expected, indicating a possible future area for refinement.

Author Keywords: Air pollution, Environmental Epidemiology, Generalized Additive Models, Human Health, Multivariate Models, Thin Plate Splines

2023

"Multimodal Contrast" from the Multivariate Analysis of Hyperspectral CARS Images

Type:
Names:
Creator (cre): Tabarangao, Joel Torralba, Thesis advisor (ths): Slepkov, Aaron D, Degree granting institution (dgg): Trent University
Abstract:

The typical contrast mechanism employed in multimodal CARS microscopy involves the use of other nonlinear imaging modalities such as two-photon excitation fluorescence (TPEF) microscopy and second harmonic generation (SHG) microscopy to produce a molecule-specific pseudocolor image. In this work, I explore the use of unsupervised multivariate statistical analysis tools such as Principal Component Analysis (PCA) and Vertex Component Analysis (VCA) to provide better contrast using the hyperspectral CARS data alone. Using simulated CARS images, I investigate the effects of the quadratic dependence of CARS signal on concentration on the pixel clustering and classification and I find that a normalization step is necessary to improve pixel color assignment. Using an atherosclerotic rabbit aorta test image, I show that the VCA algorithm provides pseudocolor contrast that is comparable to multimodal imaging, thus showing that much of the information gleaned from a multimodal approach can be sufficiently extracted from the CARS hyperspectral stack itself.

Author Keywords: Coherent Anti-Stokes Raman Scattering Microscopy, Hyperspectral Imaging, Multimodal Imaging, Multivariate Analysis, Principal Component Analysis, Vertex Component Analysis

2014

The Spatial Dynamics of Wind Pollination in Broadleaf Cattail (Typha latifolia): A New Method to Infer Spatial Patterns of Pollen Dispersal

Type:
Names:
Creator (cre): Ahee, Jordan, Thesis advisor (ths): Dorken, Marcel E, Degree committee member (dgc): Freeland, Joanna R, Degree committee member (dgc): Burness, Gary, Degree committee member (dgc): Pond, Bruce, Degree granting institution (dgg): Trent University
Abstract:

Natural populations of flowering plants rarely have perfectly uniform distributions, so trends in pollen dispersal should affect the size of the pollination neighbourhood and influence mating opportunities. Here I used spatial analysis to determine the size of the pollination neighbourhood in a stand of the herbaceous, wind-pollinated plant (Typha latifolia; broad-leaved cattail) by evaluating patterns of pollen production and seed set by individual cattail shoots. I found a positive correlation between pollen production and seed set among near-neighbour shoots (i.e., within 4 m2 patches of the stand; Pearson's r = 0.235, p < 0.05, df = 77) that was not driven by a correlation between these variables within inflorescences (Pearson's r = 0.052, p > 0.45, df = 203). I also detected significant spatial autocorrelations in seed set over short distances (up to ~ 5 m) and a significant cross-correlation between pollen production and seed set over distances of < 1 m indicating that the majority of pollination events involve short distances. Patterns of pollen availability were simulated to explore the shape of the pollen dispersal curve. Simulated pollen availability fit actual patterns of seed set only under assumptions of highly restricted pollen dispersal. Together, these findings indicate that even though Typha latifolia produces copious amounts of pollen, the vast majority of pollen dispersal was highly localized to distances of ~ 1 m. Moreover, although Typha latifolia is self-compatible and has been described as largely selfing, my results are more consistent with the importance of pollen transfer between nearby inflorescences. Therefore, realized selfing rates of Typha latifolia should largely depend on the clonal structure of populations.

Author Keywords: clonal structure, correlogram, dispersal curves, pollination, spatial analysis, Typha latifolia

2014

Prescription Drugs: From Paper to Database with Application to Air Pollution-Related Public Health Risk

Type:
Names:
Creator (cre): Sung, Kyungeun, Thesis advisor (ths): Burr, Wesley, Degree committee member (dgc): Shin, Hwashin, Degree committee member (dgc): Pollanen, Marco, Degree granting institution (dgg): Trent University
Abstract:

Medication used to treat human illness is one of the greatest developments in human history. In Canada, prescription drugs have been developed and made available to treat a wide variety of illnesses, from infections to heart disease and so on. Records of prescription drug fulfillment at coarse Canadian geographic scales were obtained from Health Canada in order to track the use of these drugs by the Canadian population.

The obtained prescription drug fulfillment records were in a variety of inconsistent formats, including a large selection of years for which only paper tabular records were available (hard copies). In this work, we organize, digitize, proof and synthesize the full available data set of prescription drug records, from paper to final database. Extensive quality control was performed on the data before use. This data was then analyzed for temporal and spatial changes in prescription drug use across Canada from 1990-2013.

In addition, one of major research areas in environmental epidemiological studies is the study of population health risk associated with exposure to ambient air pollution. Prescription drugs can moderate public health risk, by reducing the drug user's physiological symptoms and preventing acute health effects (e.g., strokes, heart attacks, etc.). The cleaned prescription drug data was considered in the context of a common model to examine its influence on the association between air pollution exposure and various health outcomes. Since, prescription drug data were available only at the provincial level, a Bayesian hierarchical model was employed to include the prescription drugs as a covariate at regional level, which were then combined to estimate the association at national level. Although further investigations are required, the study results suggest that the prescription drugs influenced the air pollution related public health risk.

Author Keywords: Data, Error checking, Population health, Prescriptions

2022

A Framework for Testing Time Series Interpolators

Type:
Names:
Creator (cre): Castel, Sophie Terra Marguerite, Thesis advisor (ths): Burr, Wesley S, Degree committee member (dgc): Pollanen, Marco, Degree granting institution (dgg): Trent University
Abstract:

The spectrum of a given time series is a characteristic function describing its frequency properties. Spectrum estimation methods require time series data to be contiguous in order for robust estimators to retain their performance. This poses a fundamental challenge, especially when considering real-world scientific data that is often plagued by missing values, and/or irregularly recorded measurements. One area of research devoted to this problem seeks to repair the original time series through interpolation. There are several algorithms that have proven successful for the interpolation of considerably large gaps of missing data, but most are only valid for use on stationary time series: processes whose statistical properties are time-invariant, which is not a common property of real-world data. The Hybrid Wiener interpolator is a method that was designed for repairing nonstationary data, rendering it suitable for spectrum estimation. This thesis work presents a computational framework designed for conducting systematic testing on the statistical performance of this method in light of changes to gap structure and departures from the stationarity assumption. A comprehensive audit of the Hybrid Wiener Interpolator against other state-of-the art algorithms will also be explored.

Author Keywords: applied statistics, hybrid wiener interpolator, imputation, interpolation, R statistical software, time series

2020

Range-Based Component Models for Conditional Volatility and Dynamic Correlations

Type:
Names:
Creator (cre): Swanson, Stephen, Thesis advisor (ths): Cater, Bruce, Thesis advisor (ths): Pollanen, Marco, Degree granting institution (dgg): Trent University
Abstract:

Volatility modelling is an important task in the financial markets. This paper first evaluates the range-based DCC-CARR model of Chou et al. (2009) in modelling larger systems of assets, vis-à-vis the traditional return-based DCC-GARCH. Extending Colacito, Engle and Ghysels (2011), range-based volatility specifications are then employed in the first-stage of DCC-MIDAS conditional covariance estimation, including the CARR model of Chou et al. (2005). A range-based analog to the GARCH-MIDAS model of Engle, Ghysels and Sohn (2013) is also proposed and tested - which decomposes volatility into short- and long-run components and corrects for microstructure biases inherent to high-frequency price-range data. Estimator forecasts are evaluated and compared in a minimum-variance portfolio allocation experiment following the methodology of Engle and Colacito (2006). Some consistent inferences are drawn from the results, supporting the models proposed here as empirically relevant alternatives. Range-based DCC-MIDAS estimates produce efficiency gains over DCC-CARR which increase with portfolio size.

Author Keywords: asset allocation, DCC MIDAS, dynamic correlations, forecasting, portfolio risk management, volatility

2017

Population-Level Ambient Pollution Exposure Proxies

Type:
Names:
Creator (cre): Scott, Carlone Livingston, Thesis advisor (ths): Burr, Wesley S, Degree granting institution (dgg): Trent University
Abstract:

The Air Health Trend Indicator (AHTI) is a joint Health Canada / Environment and Climate Change Canada initiative that seeks to model the Canadian national population health risk due to acute exposure to ambient air pollution. The common model in the field uses averages of local ambient air pollution monitors to produce a population-level exposure proxy variable. This method is applied to ozone, nitrogen dioxide, particulate matter, and other similar air pollutants.

We examine the representative nature of these proxy averages on a large-scale Canadian data set, representing hundreds of monitors and dozens of city-level populations. The careful determination of temporal and spatial correlations between the disparate monitors allows for more precise estimation of population-level exposure, taking inspiration from the land-use regression models commonly used in geography. We conclude this work with an examination of the risk estimation differences between the original, simplistic population exposure metric and our new, revised metric.

Author Keywords: Air Pollution, Population Health Risk, Spatial Process, Spatio-Temporal, Temporal Process, Time Series

2019

Size and fluorescence properties of allochthonous dissolved organic matter: characterization, transformations, and reactivity

Type:
Names:
Creator (cre): Cuss, Chad Warren, Thesis advisor (ths): Gueguen, Celine, Degree committee member (dgc): Watmough, Shaun, Degree committee member (dgc): McConnell, Sabine, Degree committee member (dgc): Dillon, Peter, Degree granting institution (dgg): Trent University
Abstract:

Dissolved organic matter (DOM) is a mixture of molecules with dynamic structure and composition that are ubiquitous in aquatic systems. DOM has several important functions in both natural and engineered systems, such as supporting microorganisms, governing the toxicity of metals and other pollutants, and controlling the fate of dissolved carbon. The structure and composition of DOM determine its reactivity, and hence its effectiveness in these ecosystem functions.

While the structure, composition, and reactivity of riverine and marine DOM have been previously investigated, those of allochthonous DOM collected prior to exposure to microbes and sunlight have received scant attention. The following dissertation constitutes the first in-depth study of the structure, composition, and reactivity of allochthonous DOM at its point of origin (i.e. leaf leachates, LLDOM), as detected by measuring its size and optical properties. Concomitantly, novel chemometric methods were developed to interpret size-resolved data obtained using asymmetrical flow field-flow fractionation, including spectral deconvolution and the application of machine learning algorithms such as self-organizing maps to fluorescence data using a dataset of more than 1000 fluorescence excitation-emission matrices.

The size and fluorescence properties of LLDOM are highly distinct. Indeed, LLDOM was correctly classified as one of 13 species/sources with 92.5% accuracy based on its fluorescence composition, and LLDOM was distinguished from riverine DOM sampled from eight different rivers with 98.3% accuracy. Additionally, both fluorescence and size properties were effective conservative tracers of DOC contribution in pH-controlled mixtures of leaf leachates and riverine DOM over two weeks. However, the structure of LLDOM responded differently to pH changes for leaves/needles from different tree species, and for older needles. Structural changes were non-reversible.

Copper-binding strength (log K) differed for the different fluorescent components of DOM in a single allochthonous source by more than an order of magnitude (4.73 compared to 6.11). Biotransformation preferentially removed protein/polyphenol-like fluorescence and altered copper-binding parameters: log K increased from 4.7 to 5.5 for one fluorescent component measured by fluorescence quenching, but decreased from 7.2 to 5.8 for the overall DOM, as measured using voltammetry. The complexing capacity of DOM increased in response to biotransformation for both fluorescent and total DOM. The relationship between fluorescence and size properties was consistent for fresh allochthonous DOM, but differed in aged material.

Since the size and fluorescence properties of LLDOM are strikingly different from those of riverine DOM, deeper investigation into transformative pathways and mixing processes is required to elucidate the contribution of riparian plant species to DOM signatures in rivers.

Author Keywords: Analytical chemistry, Chemometrics, Dissolved organic matter (DOM), Field-flow fractionation, Fluorescence spectroscopy, Parallel factor analysis (PARAFAC)

2015

Modelling Depressive Symptoms in Emerging Adulthood: Intergenerational Risk and the Protective Role of Trait Emotional Intelligence

Type:
Names:
Creator (cre): Snetsinger, Samantha Wynne, Thesis advisor (ths): Parker, James, Degree committee member (dgc): Keefer, Kateryna, Degree committee member (dgc): Carter, Bruce, Degree granting institution (dgg): Trent University
Abstract:

Depression during the transition into adulthood is a growing mental health concern, with overwhelming evidence linking the developmental risk for depressive symptoms with maternal depression. In addition, there is a lack of research on the protective role of socioemotional competencies in this context. This study examines independent and joint effects of maternal depression and trait emotional intelligence (TEI) on the longitudinal trajectory of depressive symptoms during emerging adulthood. A series of latent growth models was applied to three biennial cycles of data from a nationally representative sample (N=933) from the Canadian National Longitudinal Survey of Children and Youth. We assessed the trajectory of self-reported depressive symptoms from age 20 to 24 years, as well as whether it was moderated by maternal depression at age 10 to 11 and TEI at age 20, separately by gender. The results indicated that mean levels of depression declined during the emerging adulthood in females, but remained relatively stable in males. Maternal depressive symptoms significantly positively predicted depressive symptoms across the entire emerging adulthood in females, but only at age 20-21 for males. In addition, likelihood of developing depressive symptoms was attenuated by higher global TEI in both females and males, and additionally by higher interpersonal skills in males. Our findings suggest that interventions for depressive symptoms in emerging adulthood should consider development of socioemotional competencies.

Author Keywords: Depression, Depressive Symptoms, Emerging Adulthood, Intergenerational Risk, Longitudinal, Trait Emotional Intelligence

2020