📄 Abstract
Detecting anomalies in high-dimensional, highly imbalanced transaction data is critical for financial security. This study evaluates three unsupervised approaches — Isolation Forest, One-Class SVM, and a deep Autoencoder — on the Kaggle Credit Card Fraud Detection dataset (284,807 transactions; 492 fraudulent; ≈0.172% fraud). Raw features (Time, Amount) were standardized and a 70:30 train–test split was used; unsupervised models were trained without label information and assessed post-hoc using precision, recall, F1-score, and ROC-AUC. The Autoencoder achieved the best discrimination (ROC-AUC ≈ 0.96) and high recall for rare fraud cases; Isolation Forest provided a strong balance of performance and interpretability (ROC-AUC ≈ 0.94); One-Class SVM performed acceptably (ROC-AUC ≈ 0.91) but scaled poorly. Supervised baselines (Logistic Regression and Random Forest with SMOTE) reached ROC-AUC ≈ 0.97 and ≈ 0.956, respectively, but rely on labeled data and showed unfavorable precision–recall trade-offs. We discuss deployment considerations (computational cost, interpretability, and real-time processing) and recommend a hybrid pipeline: use Isolation Forest or Autoencoder for initial screening and a supervised verifier for high-confidence alerts. The proposed framework enhances detection of rare fraudulent events while controlling false positives, making it practical for operational fraud-detection systems.
🏷️ Keywords
📚 How to Cite:
Sanika Thete , UNSUPERVISED MACHINE LEARNING APPROACHES FOR ANOMALY DETECTION IN HIGH-DIMENSIONAL DATA , Volume 11 , Issue 10, October 2025, EPRA International Journal of Multidisciplinary Research (IJMR) ,