Feng, Wenying
Influence of geodemographic factors on electricity consumption and forecasting models
The residential sector is a major consumer of electricity, and its demand will rise by 65 percent by the end of 2050. The electricity consumption of a household is determined by various factors, e.g. house size, socio-economic status of the family, size of the family, etc. Previous studies have only identified a limited number of socio-economic and dwelling factors. In this thesis, we study the significance of 826 geodemographic factors on electricity consumption for 4917 homes in the City of London. Geodemographic factors cover a wide array of categories e.g. social, economic, dwelling, family structure, health, education, finance, occupation, and transport. Using Spearman correlation, we have identified 354 factors that are strongly correlated with electricity consumption. We also examine the impact of using geodemographic factors in designing forecasting models. In particular, we develop an encoder-decoder LSTM model which shows improved accuracy with geodemographic factors. We believe that our study will help energy companies design better energy management strategies.
Author Keywords: Electricity forecasting, Encoder-decoder model, Geodemographic factors, Socio-economic factors
Smote and Performance Measures for Machine Learning Applied to Real-Time Bidding
In the context of Real-Time Bidding (RTB) the machine learning problems of
imbalanced classes and model selection are investigated. Synthetic Minority Oversampling Technique (SMOTE) is commonly used to combat imbalanced classes but a shortcoming is identified. Use of a distance threshold is identified as a solution and testing in a live RTB environment shows significant improvement. For model selection, the statistical measure Critical Success Index (CSI) is modified to add emphasis on recall. This new measure (CSI-R) is empirically compared with other measures such as accuracy, lift, efficiency, true skill score, Heidke's skill score and Gilbert's skill score. In all cases CSI-R is shown to provide better application to the RTB industry.
Author Keywords: imbalanced classes, machine learning, online advertising, performance measures, real-time bidding, SMOTE
Time Series Algorithms in Machine Learning - A Graph Approach to Multivariate Forecasting
Forecasting future values of time series has long been a field with many and varied applications, from climate and weather forecasting to stock prediction and economic planning to the control of industrial processes. Many of these problems involve not only a single time series but many simultaneous series which may influence each other. This thesis provides methods based on machine learning of handling such problems.
We first consider single time series with both single and multiple features. We review the algorithms and unique challenges involved in applying machine learning to time series. Many machine learning algorithms when used for regression are designed to produce a single output value for each timestamp of interest with no measure of confidence; however, evaluating the uncertainty of the predictions is an important component for practical forecasting. We therefore discuss methods of constructing uncertainty estimates in the form of prediction intervals for each prediction. Stability over long time horizons is also a concern for these algorithms as recursion is a common method used to generate predictions over long time intervals. To address this, we present methods of maintaining stability in the forecast even over large time horizons. These methods are applied to an electricity forecasting problem where we demonstrate the effectiveness for support vector machines, neural networks and gradient boosted trees.
We next consider spatiotemporal problems, which consist of multiple interlinked time series, each of which may contain multiple features. We represent these problems using graphs, allowing us to learn relationships using graph neural networks. Existing methods of doing this generally make use of separate time and spatial (graph) layers, or simply replace operations in temporal layers with graph operations. We show that these approaches have difficulty learning relationships that contain time lags of several time steps. To address this, we propose a new layer inspired by the long-short term memory (LSTM) recurrent neural network which adds a distinct memory state dedicated to learning graph relationships while keeping the original memory state. This allows the model to consider temporally distant events at other nodes without affecting its ability to model long-term relationships at a single node. We show that this model is capable of learning the long-term patterns that existing models struggle with. We then apply this model to a number of real-world bike-share and traffic datasets where we observe improved performance when compared to other models with similar numbers of parameters.
Author Keywords: forecasting, graph neural network, LSTM, machine learning, neural network, time series
Stability Properties of Disease Models under Economic Expectations
Comprehending the dynamics of infectious diseases is very important in formulating public health policies to tackling their prevalence. Mathematical epidemiology (ME) has played a very vital role in achieving the above. Nevertheless, classical mathematical epidemiological models do not explicitly model the behavioural responses of individuals in the presence of prevalence of these diseases. Economic epidemiology (EE) as a field has stepped in to fill this gap by integrating economic and mathematical concepts within one framework. This thesis investigated two issues in this area. The methods employed are the standard linear analysis of stability of dynamical systems and numerical simulation. Below are the investigations and the findings of this thesis:
Firstly, an investigation into the stability properties of the equilibria of EE
models is carried out. We investigated the stability properties of modified EE systems studied by Aadland et al. [6] by introducing a parametric quadratic utility function into the model, thus making it possible to model the maximum number of contacts made by rational individuals to be determined by a parameter. This parameter in particular influences the level of utility of rational individuals. We have shown that if rational individuals have a range of possible contacts to choose from, with the maximum of the number of contacts allowable for these individuals being dependent on a parameter, the variation in this parameter tends to affect the stability properties of the system. We also showed that under the assumption of permanent recovery for
disease coupled with individuals observing or not observing their immunity, death
and birth rates can affect the stability of the system. These parameters also have
effect on the dynamics of the EE SIS system.
Secondly, an EE model of syphilis infectivity among &ldquo men who have sex with men &rdquo (MSM) in detention centres is developed in an attempt at looking at the effect of behavioural responses on the disease dynamics among MSM. This was done by explicitly incorporating the interplay of the biology of the disease and the behaviour of the inmates. We investigated the stability properties of the system under rational expectations where we showed that: (1) Behavioural responses to the prevalence of
the disease affect the stability of the system. Therefore, public health policies have the tendency of putting the system on indeterminate paths if rational MSM have complete knowledge of the laws governing the motion of the disease states as well as a complete understanding on how others behave in the system when faced with risk-benefit trade-offs. (2) The prevalence of the disease in the long run is influenced by incentives that drive the utility of the MSM inmates. (3) The interplay between the dynamics of the biology of the disease and the behavioural responses of rational MSM tends to put the system at equilibrium quickly as compared to its counterpart (that is when the system is solely dependent on the biology of the disease) when subjected to small perturbation.
Author Keywords: economic and mathematical epidemiology models, explosive path, indeterminate-path stability, numerical solution, health gap, saddle-path stability, syphilis,
Automated Grading of UML Class Diagrams
Learning how to model the structural properties of a problem domain or an object-oriented design in form of a class diagram is an essential learning task in many software engineering courses. Since grading UML assignments is a cumbersome and time-consuming task, there is a need for an automated grading approach that can assist the instructors by speeding up the grading process, as well as ensuring consistency and fairness for large classrooms. This thesis presents an approach for automated grading of UML class diagrams. A metamodel is proposed to establish mappings between the instructor solution and all the solutions for a class, which allows the instructor to easily adjust the grading scheme. The approach uses a grading algorithm that uses syntactic, semantic and structural matching to match a student's solutions with the instructor's solution. The efficiency of this automated grading approach has been empirically evaluated when applied in two real world settings: a beginner undergraduate class of 103 students required to create a object-oriented design model, and an advanced undergraduate class of 89 students elaborating a domain model. The experiment result shows that the grading approach should be configurable so that the grading approach can adapt the grading strategy and strictness to the level of the students and the grading styles of the different instructors. Also it is important to considering multiple solution variants in the grading process. The grading algorithm and tool are proposed and validated experimentally.
Author Keywords: automated grading, class diagrams, model comparison
Positive Solutions for Boundary Value Problems of Second Order Ordinary Differential Equations
In this thesis, we study modelling with non-linear ordinary differential equations, and the existence of positive solutions for Boundary Value Problems (BVPs). These problems have wide applications in many areas. The focus is on the extensions of previous work done on non-linear second-order differential equations with boundary conditions involving first-order derivative. The contribution of this thesis has four folds. First, using a fixed point theorem on order intervals, the existence of a positive solution on an interval for a non-local boundary value problem is obtained. Second, considering a different boundary value problem that consists of the first-order derivative in the non-linear term, an increasing solution is obtained by applying the Krasnoselskii-Guo fixed point theorem. Third, the existence of two solutions, one solution and no solution for a BVP is proved by using fixed point index and iteration methods. Last, the results of Green's function unify some methods in studying the existence of positive solutions for BVPs of nonlinear differential equations. Examples are presented to illustrate the applications of our results.
Author Keywords: Banach Space, Boundary Value Problems, Differential Equations, Fixed Point, Norm, Positive Solutions
The Compression Cone Method on Existence of Solutions for Semi-linear Equations
With wide applications in many fields such as engineering, physics, chemistry, biology and social sciences, semi-linear equations have attracted great interests of researchers from various areas. In the study of existence of solutions for such class of equations, a general and commonly applied method is the compression cone method for fixed-point index. The main idea is to construct a cone in an ordered Banach space based on the linear part so that the nonlinear part can be examined in a relatively smaller region.
In this thesis, a new class of cone is proposed as a generalization to previous work. The construction of the cone is based on properties of both the linear and nonlinear part of the equation. As a result, the method is shown to be more adaptable in applications. We prove new results for both semi-linear integral equations and algebraic systems.
Applications are illustrated by examples. Limitations of such new method are also discussed.
Keywords: Algebraic systems; compression cone method; differential equations; existence of solutions; fixed point index; integral equations; semi-linear equations.
Author Keywords: algebraic systems, differential equations, existence of solutions, fixed point index, integral equations, semi-linear equations
The third wheel: How red squirrels affect the dynamics of the lynx-snowshoe hare relationship
Population cycles are regular fluctuations in population densities, however, in recent years many cycles have begun to disappear. With Canada lynx this dampening has also been seen with decreasing latitude corresponding to an increase in prey diversity. My study investigates the role of alternate prey on the stability of the lynx-hare cycle by first comparing the functional responses of two sympatric but ecologically distinct predators on a primary and alternate prey. I then populated a three species predator-prey model to investigate the role of alternate prey on population stability. My results showed that alternate prey can promote stability, though they are unlikely to "stop the cycle". Furthermore, stability offered by alternate prey is contingent on its ability to increase intraspecific competition. My study highlights that population cycles are not governed by a single factor and that future research needs to be cognizant of interactions between alternate prey and intraspecific competition.
Author Keywords: alternate prey, Canis latrans, functional response, Lepus americanus, Lynx canadensis, Tamiasciurus hudsonicus