Fraud Detection in Financial Businesses Using Data Mining Approaches

Abstract

The purpose of this research is to apply four methods on two data sets, a Synthetic

dataset and a Real-World dataset, and compare the results to each other with the

intention of arriving at methods to prevent fraud. Methods used include Logistic Regression,

Isolation Forest, Ensemble Method and Generative Adversarial Networks.

Results show that all four models achieve accuracies between 91% and 99% except

Isolation Forest gave 69% accuracy for the Synthetic dataset.

The four models detect fraud well when built on a training set and tested with

a test set. Logistic Regression achieves good results with less computational eorts.

Isolation Forest achieve lower results accuracies when the data is sparse and not preprocessed

correctly. Ensemble Models achieve the highest accuracy for both datasets.

GAN achieves good results but overts if a big number of epochs was used. Future

work could incorporate other classiers.

Author Keywords: Ensemble Method, GAN, Isolation forest, Logistic Regression, Outliers

Item Description

Type

Genre

Contributors

Thesis advisor (ths): McConnell, Sabine

Thesis advisor (ths): Hurley, Richard

Degree granting institution (dgg): Trent University

Date Issued

2020

Date (Unspecified)

2020

Place Published

Peterborough, ON

Language

Form

Extent

195 pages

Rights

Subject (Topical)

Local Identifier

TC-OPET-10773

Member of

Publisher

Trent University

Degree