UNSUPERVISED MACHINE LEARNING APPROACHES FOR ANOMALY DETECTION IN HIGH-DIMENSIONAL DATA


Sanika Thete
Department of Data science, , Dr. D. Y . Patil college of Arts, Science and Commerce, Pimpri, Pune.
Abstract
Detecting anomalies in high-dimensional, highly imbalanced transaction data is critical for financial security. This study evaluates three unsupervised approaches — Isolation Forest, One-Class SVM, and a deep Autoencoder — on the Kaggle Credit Card Fraud Detection dataset (284,807 transactions; 492 fraudulent; ≈0.172% fraud). Raw features (Time, Amount) were standardized and a 70:30 train–test split was used; unsupervised models were trained without label information and assessed post-hoc using precision, recall, F1-score, and ROC-AUC. The Autoencoder achieved the best discrimination (ROC-AUC ≈ 0.96) and high recall for rare fraud cases; Isolation Forest provided a strong balance of performance and interpretability (ROC-AUC ≈ 0.94); One-Class SVM performed acceptably (ROC-AUC ≈ 0.91) but scaled poorly. Supervised baselines (Logistic Regression and Random Forest with SMOTE) reached ROC-AUC ≈ 0.97 and ≈ 0.956, respectively, but rely on labeled data and showed unfavorable precision–recall trade-offs. We discuss deployment considerations (computational cost, interpretability, and real-time processing) and recommend a hybrid pipeline: use Isolation Forest or Autoencoder for initial screening and a supervised verifier for high-confidence alerts. The proposed framework enhances detection of rare fraudulent events while controlling false positives, making it practical for operational fraud-detection systems.
Keywords: Anomaly detection; Unsupervised learning; Autoencoder; Isolation Forest; One-Class SVM; Credit card fraud
Journal Name :
EPRA International Journal of Multidisciplinary Research (IJMR)

VIEW PDF
Published on : 2025-10-07

Vol : 11
Issue : 10
Month : October
Year : 2025
Copyright © 2025 EPRA JOURNALS. All rights reserved
Developed by Peace Soft