Information science
Smote and Performance Measures for Machine Learning Applied to Real-Time Bidding
In the context of Real-Time Bidding (RTB) the machine learning problems of
imbalanced classes and model selection are investigated. Synthetic Minority Oversampling Technique (SMOTE) is commonly used to combat imbalanced classes but a shortcoming is identified. Use of a distance threshold is identified as a solution and testing in a live RTB environment shows significant improvement. For model selection, the statistical measure Critical Success Index (CSI) is modified to add emphasis on recall. This new measure (CSI-R) is empirically compared with other measures such as accuracy, lift, efficiency, true skill score, Heidke's skill score and Gilbert's skill score. In all cases CSI-R is shown to provide better application to the RTB industry.
Author Keywords: imbalanced classes, machine learning, online advertising, performance measures, real-time bidding, SMOTE
My Canadian Story: Multiculturalism and Meaning-Making in Local Archives
Canada prides itself on being a multicultural nation, but the stories of people who are not "Canadian-Canadians," as defined by Eva Mackey, are underrepresented in archives. This project investigates three local archives and one online archive in Peterborough, Ontario, employing Rita Dhamoon's practice of "accounts of meaning-making" to understand how archives contribute to a community's understanding of itself and who belongs there. The findings indicate that the city's "Canadian-Canadians," who have portrayed them as transient and only temporarily settled in the city, frequently mediate the stories of "other" populations in Peterborough's archival records. This account of meaning-making provides an entry point for changing this understanding and making archives more welcoming and accessible in the city and beyond.
Author Keywords: Archives, Community, Identity, Immigration, Integration, Multiculturalism
Utilizing Class-Specific Thresholds Discovered by Outlier Detection
We investigated if the performance of selected supervised machine-learning techniques could be improved by combining univariate outlier-detection techniques and machine-learning methods. We developed a framework to discover class-specific thresholds in class probability estimates using univariate outlier detection and proposed two novel techniques to utilize these class-specific thresholds. These proposed techniques were applied to various data sets and the results were evaluated. Our experimental results suggest that some of our techniques may improve recall in the base learner. Additional results suggest that one technique may produce higher accuracy and precision than AdaBoost.M1, while another may produce higher recall. Finally, our results suggest that we can achieve higher accuracy, precision, or recall when AdaBoost.M1 fails to produce higher metric values than the base learner.
Author Keywords: AdaBoost, Boosting, Classification, Class-Specific Thresholds, Machine Learning, Outliers