Feature Binning and Quantile Transformation

Feature Binning and Quantile Transformation

event_note 11.06.2021

As part of the research used, for example, in the StockPicker application, we have recently implemented the method Feature Binning and Quantile Transformation to better classify data. Due to upgraded data preparation, our machine learning models now achieve better results.

Why (goal)

What (key points)

How (procedure)

Discretization

Binning, also known as categorization or discretization, is the process of translating a quantitative variable into a set of two or more qualitative buckets (i.e., categories)

— Page 129, Feature Engineering and Selection, 2019.

Different methods for grouping the values into k discrete bins can be used; common techniques include:

Discussion

My short answer to when binning is OK to use is this: When the points of discontinuity are already known before looking at the data (these are the bin endpoints) and if it is known that the relationship between x and y within each bin that has non-zero length is flat.

– Frank Harrell Feb 4 ’19 at 16:49, Stackoverflow