Applied Modeling and Quantitative Methods
Machine Learning for Aviation Data
This thesis is part of an industry project which collaborates with an aviation technology company on pilot performance assessment. In this project, we propose utilizing the pilots' training data to develop a model that can recognize the pilots' activity patterns for evaluation. The data will present as a time series, representing a pilot's actions during maneuvers. In this thesis, the main contribution is focusing on a multivariate time series dataset, including preprocessing and transformation. The main difficulties in time series classification is the data sequence of the time dimension. In this thesis, I developed an algorithm which formats time series data into equal length data.
Three classification and two transformation methods were used. In total, there are six models for comparison. The initial accuracy was 40%. By optimization through resampling, we increased the accuracy to 60%.
Author Keywords: Data Mining, K-NN, Machine Learning, Multivariate Time Series Classification, Time Series Forest
Particulate Matter Component Analyses in Relation to Public Health in Canada
This thesis explores the shot-term relationship between exposure to ambient air pollution and human health through metrics such as mortality and hospitalization in Canada. We begin by detailing the organization and interpolation of air pollution data from its partially quality-controlled source form. Analyses of seasonal, regional and temporal trends of all major components of PM2.5, was performed, showing a seasonal variation across most regions and validating the dataset.
A one-pollutant statistical Generalized Additive Model was applied to the data, estimating the health risk associated with exposure to thirteen different components of PM2.5. The selected components were based on those that compromised the majority of the mass and included: sulphate, nitrate, zinc, silicon, iron, nickel, vanadium, potassium, organic carbon, organic matter, elemental carbon, total carbon. Trends based on annual estimates of the association for PM2.5, and its constituents,were compared, showing that carbonaceous compounds, sulphate and nitrate had similar estimates of association. Many estimates, as is common in population ecologic epidemiology, had association estimates statistically indistinguishable from zero, but with clear features of interest, including evident differences between cold and warm season associations in Canada's temperate climate.
A method to model two correlated pollutants (in this case, PM2.5 and O3) was developed using thin plate splines. In this approach, the location of the response surface (after accounting for the temperature, a smooth function of time and day of week) that corresponds to the average pollutant concentration and the average plus one unit was used as the estimate of the joint contribution of pollutants due to a unit increase. The estimates from the thin plate spline (TPS) approach were compared to the single pollutant models, with large increases and decreases in PM2.5 and O3 being captured in the TPS estimates. However, this approach indicated significantly larger error in the estimates than would be expected, indicating a possible future area for refinement.
Author Keywords: Air pollution, Environmental Epidemiology, Generalized Additive Models, Human Health, Multivariate Models, Thin Plate Splines
ADHD Symptomatology Across Adulthood: Stability and the Impacts on Important Life Outcomes
Objective: To improve on several methodological issues and research gaps regarding current literature investigating the stability of ADHD symptomatology across adulthood and relationships between the two core ADHD symptom dimensions (i.e., inattention and hyperactivity-impulsivity) and multiple life outcomes in adults. Method: A large sample of postsecondary students were initially assessed for ADHD symptomatology using the Conners' Adult ADHD Rating Scale (CAARS). Six years later, academic success was assessed using students' official academic records (e.g., final GPAs and degree completion status), and fifteen years later, participants were re-assessed using the CAARS and several measures of life success (e.g., relationship satisfaction, career satisfaction, and stress levels). Results: Inattention and hyperactivity-impulsivity symptoms showed strong stability across the 15-year period. Additionally, greater inattention symptoms during emerging adulthood and early middle adulthood were consistently associated with poorer life success (e.g., lower GPAs, poorer relationship and career satisfaction), particularly for men. Associations for hyperactivity-impulsivity symptoms were less consistent. Conclusion: ADHD symptomatology can be conceptualized as a stable, dimensional trait across adulthood, with robust associations with measures of life success.
Author Keywords: academic success, ADHD, adults, job satisfaction, relationship satisfaction, stability
Assessing factors associated with wealth and health of Ontario workers after permanent work injury
I drew on Bourdieu's theory of capital and theorized that different forms of economic, cultural and social capital which injured workers possessed and/or acquire over their disability trajectory may affect certain outcomes of permanent impairments. Using data from a cross-sectional survey of 494 Ontario workers with permanent impairments, I measured workers' different indicators of capital in temporal order. Hierarchical regression analyses were used to test the unique association of workers' individual characteristics, pre-injury capital, post-injury capital, and the outcomes of permanent impairments. The results show that factors related to individual characteristics, pre-injury and post-injury capital were associated with workers' perceived health change, whereas pre-injury and post-injury capital were most relevant factors in explaining workers' post-injury employment status and income recovery. When looking at the significance of individual predictors, post-injury variables were most relevant in understanding the outcomes of permanent impairment. The findings suggest that many workers faced economic and health disadvantages after permanent work injury.
Author Keywords: Bourdieu, hierarchical regression, theory of capital, work-related disability, workers with permanent impairments
Influence of geodemographic factors on electricity consumption and forecasting models
The residential sector is a major consumer of electricity, and its demand will rise by 65 percent by the end of 2050. The electricity consumption of a household is determined by various factors, e.g. house size, socio-economic status of the family, size of the family, etc. Previous studies have only identified a limited number of socio-economic and dwelling factors. In this thesis, we study the significance of 826 geodemographic factors on electricity consumption for 4917 homes in the City of London. Geodemographic factors cover a wide array of categories e.g. social, economic, dwelling, family structure, health, education, finance, occupation, and transport. Using Spearman correlation, we have identified 354 factors that are strongly correlated with electricity consumption. We also examine the impact of using geodemographic factors in designing forecasting models. In particular, we develop an encoder-decoder LSTM model which shows improved accuracy with geodemographic factors. We believe that our study will help energy companies design better energy management strategies.
Author Keywords: Electricity forecasting, Encoder-decoder model, Geodemographic factors, Socio-economic factors
Academic Efficiency: The University-Firm Innovation Market, Intellectual Property Rights and Teaching
Universities produce a significant and increasing share of basic research that is later commercialized by firms. We argue that the university's prominence as a producer of basic research is the result of a differential efficiency in research production that cannot be replicated by firms or individual agents - teaching. By using research accomplishments to signal knowledge and attract tuition-paying students, universities are uniquely positioned to undertake certain types of research projects. However, in a market for innovation without patent rights, a significant and increasing number of basic research projects, that are social welfare improving, cannot be initiated by firms or universities. The extension of patent rights to university-generated research elegantly redresses this issue and leaves us to ponder important questions about the future of our innovation-driven economies.
Author Keywords: Innovation, Intellectual Property Rights, Research, Science Technology and Innovation Policy
Modelling Submerged Coastal Environments: Remote Sensing Technologies, Techniques, and Comparative Analysis
Built upon remote sensing and GIS littoral zone characterization methodologies of the past decade, a series of loosely coupled models aimed to test, compare and synthesize multi-beam SONAR (MBES), Airborne LiDAR Bathymetry (ALB), and satellite based optical data sets in the Gulf of St. Lawrence, Canada, eco-region. Bathymetry and relative intensity metrics for the MBES and ALB data sets were run through a quantitative and qualitative comparison, which included outputs from the Benthic Terrain Modeller (BTM) tool. Substrate classification based on relative intensities of respective data sets and textural indices generated using grey level co-occurrence matrices (GLCM) were investigated. A spatial modelling framework built in ArcGISTM for the derivation of bathymetric data sets from optical satellite imagery was also tested for proof of concept and validation. Where possible, efficiencies and semi-automation for repeatable testing was achieved using ArcGISTM ModelBuilder. The findings from this study could assist future decision makers in the field of coastal management and hydrographic studies.
Keywords: Seafloor terrain characterization, Benthic Terrain Modeller (BTM), Multi-beam SONAR, Airborne LiDAR Bathymetry, Satellite Derived Bathymetry, ArcGISTM ModelBuilder, Textural analysis, Substrate classification
SPAF-network with Saturating Pretraining Neurons
In this work, various aspects of neural networks, pre-trained with denoising autoencoders (DAE) are explored. To saturate neurons more quickly for feature learning in DAE, an activation function that offers higher gradients is introduced. Moreover, the introduction of sparsity functions applied to the hidden layer representations is studied. More importantly, a technique that swaps the activation functions of fully trained DAE to logistic functions is studied, networks trained using this technique are reffered to as SPAF-networks. For evaluation, the popular MNIST dataset as well as all \(3\) sub-datasets of the Chars74k dataset are used for classification purposes. The SPAF-network is also analyzed for the features it learns with a logistic, ReLU and a custom activation function. Lastly future roadmap is proposed for enhancements to the SPAF-network.
Author Keywords: Artificial Neural Network, AutoEncoder, Machine Learning, Neural Networks, SPAF network, Unsupervised Learning
Smote and Performance Measures for Machine Learning Applied to Real-Time Bidding
In the context of Real-Time Bidding (RTB) the machine learning problems of
imbalanced classes and model selection are investigated. Synthetic Minority Oversampling Technique (SMOTE) is commonly used to combat imbalanced classes but a shortcoming is identified. Use of a distance threshold is identified as a solution and testing in a live RTB environment shows significant improvement. For model selection, the statistical measure Critical Success Index (CSI) is modified to add emphasis on recall. This new measure (CSI-R) is empirically compared with other measures such as accuracy, lift, efficiency, true skill score, Heidke's skill score and Gilbert's skill score. In all cases CSI-R is shown to provide better application to the RTB industry.
Author Keywords: imbalanced classes, machine learning, online advertising, performance measures, real-time bidding, SMOTE
The Effect of Listing a Stock on the S&P 500 Index on the Stock's Volatility
This paper investigates the effect of listing a stock on the S&P 500 Index on the stock's volatility, using various econometrics models: GARCH and EGARCH. The study mainly addresses three issues; firstly, it analyzes stock volatility in two sub-periods, secondly, it determines whether the announcement can account for the fluctuations in the price of the stock, and finally, it investigates the change in the stock's variance. After isolating the effects of external and industry shock by using the returns on the S&P 500 Index as a proxy, the author finds evidence of structural change in the volatility of stocks after that stock is added to the index. Additionally, the existence of a dominant symmetric effect, which captures the response of volatility to news, indicate that following the onset of including the stock on the index, information flowing into the market increased. However, the rate at which old news is captured in price falls. The empirical evidence also suggests that on average a stocks variance falls and that the announcement to list a stock on the index has little effect on the stock's price.
Author Keywords: EGARCH, GARCH, S&P 500 Index, Symmetric Effect, Volatility