Usage Guide

Python Package Usage

You can import the scikit-duplo package within python and then make use of the SciKit Duplo components inside your ML Pipelines.

QuantileStackRegressor

The QuantileStackRegressor is a meta learner that performs a regression task by learning interal quantiles over the target variable as a set of new features. In the example below we include a QuantileStackRegressor in a prediction pipeline.

from skduplo.meta import QuantileStackRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import RandomForestClassifier

qsr = QuantileStackRegressor(
    classifier=RandomForestClassifier(),
    regressor=ExtraTreesRegressor(),
    cuts = [0, 50, 100, 200]
)

This model learns a set of internal classifiers that cut the training data by the regression target value. In a sense the model learns a quantile regression stack in an out-of-sample fashion, then uses the outputs of the quantile regressors as a set of new features.

MultiStackRegressor

The MultiStackRegressor is a meta learner that extends the QuantileStackRegressor by learning a set of out-of-sample regressors as well as the quantile models as internal features. In the example below we include a MultiStackRegressor in a prediction pipeline.

from skduplo.meta import MultiStackRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier

msr = MultiStackRegressor(
    classifier=RandomForestClassifier(),
    regressor_list=[ExtraTreesRegressor(), RandomForestRegressor()]
    regressor=ExtraTreesRegressor(),
    cuts = [0, 50, 100, 200]
)

Note that the regressor_list parameter is an arbitrary list of sklearn compatible regressors that the model will train internally on cross validated data to make intermediate predictions.