A COMPARATIVE STUDY ON UNIVARIATE OUTLIER WINSORIZATION METHODS IN DATA SCIENCE CONTEXT

Authors

  • Ali Abuzaid Al Azhar University- Gaza
  • Eyad Alkronz

DOI:

https://doi.org/10.26398/IJAS.0036-004

Keywords:

Capping; flooring; outlier; quantile-based.

Abstract

Handling outliers is an important step in data analysis, and it can be approached through three different ways, namely; accommodation, omission, or winsorization. This article investigates the impact of four winsorization statistics (mean, median, mode, and quantiles) on parameter estimation through an extensive simulation study. Three prob- ability distributions (normal, negative binomial, and exponential) are considered, each with varying degrees of contamination. The simulation results suggest that winsoriza- tion is effective for small contamination levels and large sample sizes. Furthermore, it is recommended to winsorize outliers in symmetric distributions using any of the loca- tion parameters. However, for asymmetric distributions, the median should be employed. To illustrate these findings, a real dataset on internet usage session durations for 4,500 users, comprising over 2 million records, are fitted to the exponential distribution. The identified outliers were winsorized using the aforementioned statistics.

Downloads

Published

2024-04-16

How to Cite

Abuzaid, A., & Alkronz, E. . (2024). A COMPARATIVE STUDY ON UNIVARIATE OUTLIER WINSORIZATION METHODS IN DATA SCIENCE CONTEXT. Statistica Applicata - Italian Journal of Applied Statistics, 36(1). https://doi.org/10.26398/IJAS.0036-004

Issue

Section

Latest articles