Hurley, Richard
An Investigation of a Hybrid Computational System for Cloud Gaming
Video games have always been intrinsically linked with the technology available for the progress of the medium. With improvements in technology correlating directly to improvements in video games, this has recently not been the case. One recent technology video games have not fully leveraged is Cloud technology. This Thesis investigates a potential solution for video games to leverage Cloud technology. The methodology compares the relative performance of a Local, Cloud and a proposed Hybrid Model of video games. We find when comparing the results of the relative performance of the Local, Cloud and Hybrid Models that there is potential in a Hybrid technology for increased performance in Cloud gaming as well as increasing stability in overall game play.
Author Keywords: cloud, cloud gaming, streaming, video game
Machine Learning for Aviation Data
This thesis is part of an industry project which collaborates with an aviation technology company on pilot performance assessment. In this project, we propose utilizing the pilots' training data to develop a model that can recognize the pilots' activity patterns for evaluation. The data will present as a time series, representing a pilot's actions during maneuvers. In this thesis, the main contribution is focusing on a multivariate time series dataset, including preprocessing and transformation. The main difficulties in time series classification is the data sequence of the time dimension. In this thesis, I developed an algorithm which formats time series data into equal length data.
Three classification and two transformation methods were used. In total, there are six models for comparison. The initial accuracy was 40%. By optimization through resampling, we increased the accuracy to 60%.
Author Keywords: Data Mining, K-NN, Machine Learning, Multivariate Time Series Classification, Time Series Forest
SPAF-network with Saturating Pretraining Neurons
In this work, various aspects of neural networks, pre-trained with denoising autoencoders (DAE) are explored. To saturate neurons more quickly for feature learning in DAE, an activation function that offers higher gradients is introduced. Moreover, the introduction of sparsity functions applied to the hidden layer representations is studied. More importantly, a technique that swaps the activation functions of fully trained DAE to logistic functions is studied, networks trained using this technique are reffered to as SPAF-networks. For evaluation, the popular MNIST dataset as well as all \(3\) sub-datasets of the Chars74k dataset are used for classification purposes. The SPAF-network is also analyzed for the features it learns with a logistic, ReLU and a custom activation function. Lastly future roadmap is proposed for enhancements to the SPAF-network.
Author Keywords: Artificial Neural Network, AutoEncoder, Machine Learning, Neural Networks, SPAF network, Unsupervised Learning
An Investigation of the Impact of Big Data on Bioinformatics Software
As the generation of genetic data accelerates, Big Data has an increasing impact on the way bioinformatics software is used. The experiments become larger and more complex than originally envisioned by software designers. One way to deal with this problem is to use parallel computing.
Using the program Structure as a case study, we investigate ways in which to counteract the challenges created by the growing datasets. We propose an OpenMP and an OpenMP-MPI hybrid parallelization of the MCMC steps, and analyse the performance in various scenarios.
The results indicate that the parallelizations produce significant speedups over the serial version in all scenarios tested. This allows for using the available hardware more efficiently, by adapting the program to the parallel architecture. This is important because not only does it reduce the time required to perform existing analyses, but it also opens the door to new analyses, which were previously impractical.
Author Keywords: Big Data, HPC, MCMC, parallelization, speedup, Structure
An Investigation of Load Balancing in a Distributed Web Caching System
With the exponential growth of the Internet, performance is an issue as bandwidth is often limited. A scalable solution to reduce the amount of bandwidth required is Web caching. Web caching (especially at the proxy-level) has been shown to be quite successful at addressing this issue. However as the number and needs of the clients grow, it becomes infeasible and inefficient to have just a single Web cache. To address this concern, the Web caching system can be set up in a distributed manner, allowing multiple machines to work together to meet the needs of the clients. Furthermore, it is also possible that further efficiency could be achieved by balancing the workload across all the Web caches in the system. This thesis investigates the benefits of load balancing in a distributed Web caching environment in order to improve the response times and help reduce bandwidth.
Author Keywords: adaptive load sharing, Distributed systems, Load Balancing, Simulation, Web Caching
Support Vector Machines for Automated Galaxy Classification
Support Vector Machines (SVMs) are a deterministic, supervised machine learning algorithm that have been successfully applied to many areas of research. They are heavily grounded in mathematical theory and are effective at processing high-dimensional data. This thesis models a variety of galaxy classification tasks using SVMs and data from the Galaxy Zoo 2 project. SVM parameters were tuned in parallel using resources from Compute Canada, and a total of four experiments were completed to determine if invariance training and ensembles can be utilized to improve classification performance. It was found that SVMs performed well at many of the galaxy classification tasks examined, and the additional techniques explored did not provide a considerable improvement.
Author Keywords: Compute Canada, Kernel, SDSS, SHARCNET, Support Vector Machine, SVM
Fraud Detection in Financial Businesses Using Data Mining Approaches
The purpose of this research is to apply four methods on two data sets, a Synthetic
dataset and a Real-World dataset, and compare the results to each other with the
intention of arriving at methods to prevent fraud. Methods used include Logistic Regression,
Isolation Forest, Ensemble Method and Generative Adversarial Networks.
Results show that all four models achieve accuracies between 91% and 99% except
Isolation Forest gave 69% accuracy for the Synthetic dataset.
The four models detect fraud well when built on a training set and tested with
a test set. Logistic Regression achieves good results with less computational eorts.
Isolation Forest achieve lower results accuracies when the data is sparse and not preprocessed
correctly. Ensemble Models achieve the highest accuracy for both datasets.
GAN achieves good results but overts if a big number of epochs was used. Future
work could incorporate other classiers.
Author Keywords: Ensemble Method, GAN, Isolation forest, Logistic Regression, Outliers
Representation Learning with Restorative Autoencoders for Transfer Learning
Deep Neural Networks (DNNs) have reached human-level performance in numerous tasks in the domain of computer vision. DNNs are efficient for both classification and the more complex task of image segmentation. These networks are typically trained on thousands of images, which are often hand-labelled by domain experts. This bottleneck creates a promising research area: training accurate segmentation networks with fewer labelled samples.
This thesis explores effective methods for learning deep representations from unlabelled images. We train a Restorative Autoencoder Network (RAN) to denoise synthetically corrupted images. The weights of the RAN are then fine-tuned on a labelled dataset from the same domain for image segmentation.
We use three different segmentation datasets to evaluate our methods. In our experiments, we demonstrate that through our methods, only a fraction of data is required to achieve the same accuracy as a network trained with a large labelled dataset.
Author Keywords: deep learning, image segmentation, representation learning, transfer learning
Cloud Versus Bare Metal: A comparison of a high performance computing cluster running in a commercial cloud and on a traditional hardware cluster using OpenMP and OpenMPI
A comparison of two high performance computing clusters running on AWS and Sharcnet was done to determine which scenarios yield the best performance. Algorithm complexity ranged from O (n) to O (n3). Data sizes ranged from 195 KB to 2 GB. The Sharcnet hardware consisted of Intel E5-2683 and Intel E7-4850 processors with memory sizes ranging from 256 GB to 3072 GB. On AWS, C4.8xlarge instances were used, which run on Intel Xeon E5-2666 processors with 60 GB per instance. AWS was able to launch jobs immediately regardless of job size. The only limiting factors on AWS were algorithm complexity and memory usage, suggesting a memory bottleneck. Sharcnet had the best performance but could be hampered by the job scheduler. In conclusion, Sharcnet is best used when the algorithm is complex and has high memory usage. AWS is best used when immediate processing is required.
Author Keywords: AWS, cloud, HPC, parallelism, Sharcnet
Development of a Cross-Platform Solution for Calculating Certified Emission Reduction Credits in Forestry Projects under the Kyoto Protocol of the UNFCCC
This thesis presents an exploration of the requirements for and development of a software tool to calculate Certified Emission Reduction (CERs) credits for afforestation and reforestation projects conducted under the Clean Development Mechanism (CDM). We examine the relevant methodologies and tools to determine what is required to create a software package that can support a wide variety of projects involving a large variety of data and computations. During the requirements gathering, it was determined that the software package developed would need to support the ability to enter and edit equations at runtime. To create the software we used Java for the programming language, an H2 database to store our data, and an XML file to store our configuration settings. Through these choices, we can build a cross-platform software solution for the purpose outlined above. The end result is a versatile software tool through which users can create and customize projects to meet their unique needs as well as utilize the features provided to streamline the management of their CDM projects.
Author Keywords: Carbon Emissions, Climate Change, Forests, Java, UNFCCC, XML