Alam, Omar
Automated Grading of UML Use Case Diagrams
This thesis presents an approach for automated grading of UML Use Case diagrams. Many software engineering courses require students to learn how to model the behavioural features of a problem domain or an object-oriented design in the form of a use case diagram. Because assessing UML assignments is a time-consuming and labor-intensive operation, there is a need for an automated grading strategy that may help instructors by speeding up the grading process while also maintaining uniformity and fairness in large classrooms. The effectiveness of this automated grading approach was assessed by applying it to two real-world assignments. We demonstrate how the result is similar to manual grading, which was less than 7% on average; and when we applied some strategies, such as configuring settings and using multiple solutions, the average differences were even lower. Also, the grading methods and the tool are proposed and empirically validated.
Author Keywords: Automated Grading, Compare Models, Use Case
Influence of geodemographic factors on electricity consumption and forecasting models
The residential sector is a major consumer of electricity, and its demand will rise by 65 percent by the end of 2050. The electricity consumption of a household is determined by various factors, e.g. house size, socio-economic status of the family, size of the family, etc. Previous studies have only identified a limited number of socio-economic and dwelling factors. In this thesis, we study the significance of 826 geodemographic factors on electricity consumption for 4917 homes in the City of London. Geodemographic factors cover a wide array of categories e.g. social, economic, dwelling, family structure, health, education, finance, occupation, and transport. Using Spearman correlation, we have identified 354 factors that are strongly correlated with electricity consumption. We also examine the impact of using geodemographic factors in designing forecasting models. In particular, we develop an encoder-decoder LSTM model which shows improved accuracy with geodemographic factors. We believe that our study will help energy companies design better energy management strategies.
Author Keywords: Electricity forecasting, Encoder-decoder model, Geodemographic factors, Socio-economic factors
Time Series Algorithms in Machine Learning - A Graph Approach to Multivariate Forecasting
Forecasting future values of time series has long been a field with many and varied applications, from climate and weather forecasting to stock prediction and economic planning to the control of industrial processes. Many of these problems involve not only a single time series but many simultaneous series which may influence each other. This thesis provides methods based on machine learning of handling such problems.
We first consider single time series with both single and multiple features. We review the algorithms and unique challenges involved in applying machine learning to time series. Many machine learning algorithms when used for regression are designed to produce a single output value for each timestamp of interest with no measure of confidence; however, evaluating the uncertainty of the predictions is an important component for practical forecasting. We therefore discuss methods of constructing uncertainty estimates in the form of prediction intervals for each prediction. Stability over long time horizons is also a concern for these algorithms as recursion is a common method used to generate predictions over long time intervals. To address this, we present methods of maintaining stability in the forecast even over large time horizons. These methods are applied to an electricity forecasting problem where we demonstrate the effectiveness for support vector machines, neural networks and gradient boosted trees.
We next consider spatiotemporal problems, which consist of multiple interlinked time series, each of which may contain multiple features. We represent these problems using graphs, allowing us to learn relationships using graph neural networks. Existing methods of doing this generally make use of separate time and spatial (graph) layers, or simply replace operations in temporal layers with graph operations. We show that these approaches have difficulty learning relationships that contain time lags of several time steps. To address this, we propose a new layer inspired by the long-short term memory (LSTM) recurrent neural network which adds a distinct memory state dedicated to learning graph relationships while keeping the original memory state. This allows the model to consider temporally distant events at other nodes without affecting its ability to model long-term relationships at a single node. We show that this model is capable of learning the long-term patterns that existing models struggle with. We then apply this model to a number of real-world bike-share and traffic datasets where we observe improved performance when compared to other models with similar numbers of parameters.
Author Keywords: forecasting, graph neural network, LSTM, machine learning, neural network, time series
Predicting Irregularities in Arrival Times for Toronto Transit Buses with LSTM Recurrent Neural Networks Using Vehicle Locations and Weather Data
Public transportation systems play important role in the quality of life of citizens
in any metropolitan city. However, public transportation authorities face
criticisms from commuters due to irregularities in bus arrival times. For example,
transit bus users often complain when they miss the bus because it arrived too
early or too late at the bus stop. Due to these irregularities, commuters may miss
important appointments, wait for too long at the bus stop, or arrive late for work.
This thesis seeks to predict the occurrence of irregularities in bus arrival times by
developing machine learning models that use GPS locations of transit buses provided
by the Toronto Transit Commission (TTC) and hourly weather data. We
found that in nearly 37% of the time, buses either arrive early or late by more than
5 minutes, suggesting room for improvement in the current strategies employed by
transit authorities. We compared the performance of three machine learning models,
for which our Long Short-Term Memory (LSTM) [13] model outperformed all
other models in terms of accuracy. The error rate for LSTM model was the lowest
among Artificial Neural Network (ANN) and support vector regression (SVR). The
improved accuracy achieved by LSTM is due to its ability to adjust and update the
weights of neurons while maintaining long-term dependencies when encountering
new stream of data.
Author Keywords: ANN, LSTM, Machine Learning
Automated Grading of UML Class Diagrams
Learning how to model the structural properties of a problem domain or an object-oriented design in form of a class diagram is an essential learning task in many software engineering courses. Since grading UML assignments is a cumbersome and time-consuming task, there is a need for an automated grading approach that can assist the instructors by speeding up the grading process, as well as ensuring consistency and fairness for large classrooms. This thesis presents an approach for automated grading of UML class diagrams. A metamodel is proposed to establish mappings between the instructor solution and all the solutions for a class, which allows the instructor to easily adjust the grading scheme. The approach uses a grading algorithm that uses syntactic, semantic and structural matching to match a student's solutions with the instructor's solution. The efficiency of this automated grading approach has been empirically evaluated when applied in two real world settings: a beginner undergraduate class of 103 students required to create a object-oriented design model, and an advanced undergraduate class of 89 students elaborating a domain model. The experiment result shows that the grading approach should be configurable so that the grading approach can adapt the grading strategy and strictness to the level of the students and the grading styles of the different instructors. Also it is important to considering multiple solution variants in the grading process. The grading algorithm and tool are proposed and validated experimentally.
Author Keywords: automated grading, class diagrams, model comparison