Applied Modeling and Quantitative Methods
A Framework for Testing Time Series Interpolators
The spectrum of a given time series is a characteristic function describing its frequency properties. Spectrum estimation methods require time series data to be contiguous in order for robust estimators to retain their performance. This poses a fundamental challenge, especially when considering real-world scientific data that is often plagued by missing values, and/or irregularly recorded measurements. One area of research devoted to this problem seeks to repair the original time series through interpolation. There are several algorithms that have proven successful for the interpolation of considerably large gaps of missing data, but most are only valid for use on stationary time series: processes whose statistical properties are time-invariant, which is not a common property of real-world data. The Hybrid Wiener interpolator is a method that was designed for repairing nonstationary data, rendering it suitable for spectrum estimation. This thesis work presents a computational framework designed for conducting systematic testing on the statistical performance of this method in light of changes to gap structure and departures from the stationarity assumption. A comprehensive audit of the Hybrid Wiener Interpolator against other state-of-the art algorithms will also be explored.
Author Keywords: applied statistics, hybrid wiener interpolator, imputation, interpolation, R statistical software, time series
The Relationship Between Precarious Employment, Behaviour Addictions and Substance Use Among Canadian Young Adults: Insights From The Quinte Longitudinal Survey
This thesis utilized a unique data-set, the Quinte Longitudinal Survey, to explore relationships among precarious employment and a range of mental health problems in a representative sample of Ontario young adults. Study 1 focused on various behavioural addictions (such as problem gambling, video gaming, internet use, exercise, compulsive shopping, and sex) and precarious employment. The results showed that precariously employed men were preoccupied with gambling and sex while their female counterparts preferred shopping. Gambling and excessive shopping diminished over time while excessive sexual practices increased. Study 2 focused on the association between precarious employment and substance abuse (such as tobacco, alcohol, cannabis, hallucinogens, stimulants, and other substances). The results showed that men used cannabis more than women, and the non-precarious employed group abused alcohol more than individuals in the precarious group. This research has implications for both health care professionals and intervention program developers when working with young adults in precarious jobs.
Author Keywords: Behaviour Addictions, Precarious Employment, Substance Abuse, Young Adults
Representation Learning with Restorative Autoencoders for Transfer Learning
Deep Neural Networks (DNNs) have reached human-level performance in numerous tasks in the domain of computer vision. DNNs are efficient for both classification and the more complex task of image segmentation. These networks are typically trained on thousands of images, which are often hand-labelled by domain experts. This bottleneck creates a promising research area: training accurate segmentation networks with fewer labelled samples.
This thesis explores effective methods for learning deep representations from unlabelled images. We train a Restorative Autoencoder Network (RAN) to denoise synthetically corrupted images. The weights of the RAN are then fine-tuned on a labelled dataset from the same domain for image segmentation.
We use three different segmentation datasets to evaluate our methods. In our experiments, we demonstrate that through our methods, only a fraction of data is required to achieve the same accuracy as a network trained with a large labelled dataset.
Author Keywords: deep learning, image segmentation, representation learning, transfer learning
Combinatorial Collisions in Database Matching: With Examples from DNA
Databases containing information such as location points, web searches and fi- nancial transactions are becoming the new normal as technology advances. Conse- quentially, searches and cross-referencing in big data are becoming a common prob- lem as computing and statistical analysis increasingly allow for the contents of such databases to be analyzed and dredged for data. Searches through big data are fre- quently done without a hypothesis formulated before hand, and as these databases grow and become more complex, the room for error also increases. Regardless of how these searches are framed, the data they collect may lead to false convictions. DNA databases may be of particular interest, since DNA is often viewed as significant evi- dence, however, such evidence is sometimes not interpreted in a proper manner in the court room. In this thesis, we present and validate a framework for investigating var- ious collisions within databases using Monte Carlo Simulations, with examples from DNA. We also discuss how DNA evidence may be wrongly portrayed in the court room, and the explanation behind this. We then outline the problem which may occur when numerous types of databases are searched for suspects, and framework to address these problems.
Author Keywords: big data analysis, collisions, database searches, DNA databases, monte carlo simulation
Pathways to Innovation: Modelling University-to-Firm Research Development
Research and development activities conducted at universities and firms fuel economic growth
and play a key role in the process of innovation. Specifically, prior research has investigated the
widespread university-to-firm research development path and concluded that universities are
better suited for early stage of research while firms are better positioned for later stages. This
thesis aims to present a novel explanation for the pervasive university-to-firm research
development path. The model developed uses game theory to visualize and analyze interactions
between a firm and university under different strategies. The results reveal that as academic
research signals knowledge it helps attract tuition paying students. Generating these tuition
revenues is facilitated by university research discoveries, which, once published, a firm can build
upon to make new innovative products. In an environment of weak intellectual property rights,
moreover, the university-to-firm research development path enables firms to bypass the hefty
costs that are involved in basic research activities. The model also provides a range of solution
scenarios where a university and firm may find it viable to initiate a research line.
Author Keywords: Game theory, Intellectual property rights, Nash equilibrium, Research and development, University to-firm research path
Cloud Versus Bare Metal: A comparison of a high performance computing cluster running in a commercial cloud and on a traditional hardware cluster using OpenMP and OpenMPI
A comparison of two high performance computing clusters running on AWS and Sharcnet was done to determine which scenarios yield the best performance. Algorithm complexity ranged from O (n) to O (n3). Data sizes ranged from 195 KB to 2 GB. The Sharcnet hardware consisted of Intel E5-2683 and Intel E7-4850 processors with memory sizes ranging from 256 GB to 3072 GB. On AWS, C4.8xlarge instances were used, which run on Intel Xeon E5-2666 processors with 60 GB per instance. AWS was able to launch jobs immediately regardless of job size. The only limiting factors on AWS were algorithm complexity and memory usage, suggesting a memory bottleneck. Sharcnet had the best performance but could be hampered by the job scheduler. In conclusion, Sharcnet is best used when the algorithm is complex and has high memory usage. AWS is best used when immediate processing is required.
Author Keywords: AWS, cloud, HPC, parallelism, Sharcnet
Psychometric Properties of a Scale Developed from a Three-Factor Model of Social Competency
While existing models of emotional intelligence (EI) generally recognize the importance of social competencies (SC), there is a tendency in the literature to narrow the focus to competencies that pertain to the self. Given the experiential and perceptual differences between self- vs. other-oriented emotional abilities, this is an important limitation of existing EI models and assessment tools. This thesis explores the psychometric properties of a multidimensional model for SC. Chapter 1 describes the evolution of work on SCs in modern psychology and describes the multidimensional model of SC under review. Chapter 2 replicates this model across a variety of samples and explores the model's construct validity via basic personality and EI constructs. Chapter 3 further explores the predictive validity of the SC measure within a group of project managers and several success and wellness variables. Chapter 4 examines potential applications for the model and suggestions for further research.
Author Keywords: emotional intelligence, project management, social competency, work readiness
Sinc-Collocation Difference Methods for Solving the Gross-Pitaevskii Equation
The time-dependent Gross-Pitaevskii Equation, describing the movement of parti-
cles in quantum mechanics, may not be solved analytically due to its inherent non-
linearity. Hence numerical methods are of importance to approximate the solution.
This study develops a discrete scheme in time and space to simulate the solution
defined in a finite domain by using the Crank-Nicolson difference method and Sinc
Collocation Methods (SCM), respectively. In theory and practice, the time discretiz-
ing system decays errors in the second-order of accuracy, and SCMs are decaying
errors exponentially. A new SCM with a unique boundary treatment is proposed
and compared with the original SCM and other similar numerical techniques in time
costs and numerical errors. As a result, the new SCM decays errors faster than the
original one. Also, to attain the same accuracy, the new SCM interpolates fewer
nodes than the original SCM, which saves computational costs. The new SCM is
capable of approximating partial differential equations under different boundary con-
ditions, which can be extensively applied in fitting theory.
Author Keywords: Crank-Nicolson difference method, Gross-Pitaevskii Equation, Sinc-Collocation methods
Educational Data Mining and Modelling on Trent University Students' Academic Performance
Higher education is important. It enhances both individual and social welfare by improving productivity, life satisfaction, and health outcomes, and by reducing rates of crime. Universities play a critical role in providing that education. Because academic institutions face resource constraints, it is thus important that they deploy resources in support of student success in the most efficient ways possible. To inform that efficient deployment, this research analyzes institutional data reflecting undergraduate student performance to identify predictors of student success measured by GPA, rates of credit accumulation, and graduation rates. Using methods of cluster analysis and machine learning, the analysis yields predictions for the probabilities of individual success.
Author Keywords: Educational data mining, Students' academic performance modelling
Development of a Cross-Platform Solution for Calculating Certified Emission Reduction Credits in Forestry Projects under the Kyoto Protocol of the UNFCCC
This thesis presents an exploration of the requirements for and development of a software tool to calculate Certified Emission Reduction (CERs) credits for afforestation and reforestation projects conducted under the Clean Development Mechanism (CDM). We examine the relevant methodologies and tools to determine what is required to create a software package that can support a wide variety of projects involving a large variety of data and computations. During the requirements gathering, it was determined that the software package developed would need to support the ability to enter and edit equations at runtime. To create the software we used Java for the programming language, an H2 database to store our data, and an XML file to store our configuration settings. Through these choices, we can build a cross-platform software solution for the purpose outlined above. The end result is a versatile software tool through which users can create and customize projects to meet their unique needs as well as utilize the features provided to streamline the management of their CDM projects.
Author Keywords: Carbon Emissions, Climate Change, Forests, Java, UNFCCC, XML