The purpose of this research is to apply four methods on two data sets, a Synthetic
dataset and a Real-World dataset, and compare the results to each other with the
intention of arriving at methods to prevent fraud. Methods used include Logistic Regression,
Isolation Forest, Ensemble Method and Generative Adversarial Networks.
Results show that all four models achieve accuracies between 91% and 99% except
Isolation Forest gave 69% accuracy for the Synthetic dataset.
The four models detect fraud well when built on a training set and tested with
a test set. Logistic Regression achieves good results with less computational eorts.
Isolation Forest achieve lower results accuracies when the data is sparse and not preprocessed
correctly. Ensemble Models achieve the highest accuracy for both datasets.
GAN achieves good results but overts if a big number of epochs was used. Future
work could incorporate other classiers.
Author Keywords: Ensemble Method, GAN, Isolation forest, Logistic Regression, Outliers