In the context of Real-Time Bidding (RTB) the machine learning problems of
imbalanced classes and model selection are investigated. Synthetic Minority Oversampling Technique (SMOTE) is commonly used to combat imbalanced classes but a shortcoming is identified. Use of a distance threshold is identified as a solution and testing in a live RTB environment shows significant improvement. For model selection, the statistical measure Critical Success Index (CSI) is modified to add emphasis on recall. This new measure (CSI-R) is empirically compared with other measures such as accuracy, lift, efficiency, true skill score, Heidke's skill score and Gilbert's skill score. In all cases CSI-R is shown to provide better application to the RTB industry.
Author Keywords: imbalanced classes, machine learning, online advertising, performance measures, real-time bidding, SMOTE