All curves can be plotted using
plot_type can be any of the following values. In the formulaic representations
qini: typical Qini curve (see Radcliffe 2007), except we normalize by the total number of people in treatment. The typical definition is
aqini: adjusted Qini curve, calculated as
cuplift: cumulative uplift curve, calculated as
uplift: typical uplift curve, calculated the same as cuplift but only returning the average value within the bin, rather than cumulatively.
cgains: cumulative gains curve (see Gutierrez, Gerardy 2016), defined as
balance: ratio of treatment group size to total group size within each bin,
Above, corresponds to the fraction of individuals targeted -- the x-axis of these curves. and correspond to counts up to (except for the uplift curve, which is only within the bin at the position) or within the entire group, respectively. The subscript indicates the treatment group, and , the control. The subscript indicates the subset of the count for which individuals had a positive outcome.
A number of scores are stored in both the
train_results_ objects, containing scores calculated over the test set and train set, respectively. Namely, there are three important scores:
Q: unnormalized area between the qini curve and the random selection line.
Q, normalized by the theoretical maximum value of
Q, normalized by the practical maximum value of
Each of these can be accesses as attributes of
_cgains can be appended to obtain the same calculation for the qini curve, adjusted qini curve, or the cumulative gains curve, respectively. The score most unaffected by anomalous treatment/control ordering, without any bias to treatment or control (i.e. if you're looking at lift between two equally viable treatments) is the
q1_cgains score, but if you are looking at a simple treatment vs. control situation,
q1_aqini is preferred. Because this only really has meaning over an independent holdout [test] set, the most valuable value to access, then, would likely be
up.test_results_.q1_aqini # Over training set.
Maximal curves can also be toggled by passing flags into
Each of these curves satisfies shows the maximally attainable curve given different assumptions about the underlying data. The
show_theoretical_max curve corresponds to a sorting in which we assume that an individual is persuadable (uplift = 1) if and only if they respond in the treatment group (and the same reasoning applies to the control group, for sleeping dogs). The
show_practical_max curve assumes that all individuals that have a positive outcome in the treatment group must also have a counterpart (relative to the proportion of individuals in the treatment and control group) in the control group that did not respond. This is a more conservative, realistic curve. The former can only be attained through overfitting, while the latter can only be attained under very generous circumstances. Within the package, we also calculate the
show_no_dogs curve, which simply precludes the possibility of negative effects.
The random selection line is shown by default, but the option to toggle it off is included in case you'd like to plot multiple plots on top of each other.
The below code plots the practical max over the aqini curve of a model contained in the TransformedOutcome object
up, then overlays the aqini curve of a second model contained in
up1, also changing the line color.
ax = up.plot(show_practical_max=True, show_random_selection=False, label='Model 1')up1.plot(ax=ax, label='Model 2', color=[0.7,0,0])
It is often useful to obtain error bars on your qini curves. We've implemented two ways to do this:
up.shuffle_fit(): Seeds the `train_test_split`, fit the model over the new training data, and evaluate on the new test data. Average these curves.
up.noise_fit(): Randomly shuffle the labels independently of the features and fit a model. This can help distinguish your evaluation curves from noise.
Adjustments can also be made to the aesthetics of these curves by passing in dictionaries that pass down to plot elements. For example, `shuffle_band_kwargs` is a dictionary of kwargs that modifies the `fill_between` shaded error bar region.
UpliftEval class can also independently be used to apply the above evaluation visualizations and calculations. Note that the
up object uses
UpliftEval to generate the plots, so the
UpliftEval class object for the train set and test set can be obtained in
from pylift.eval import UpliftEvalupev = UpliftEval(treatment, outcome, predictions)upev.plot(plot_type='aqini')
It generally functions the same as the
up.plot() function, except error bars cannot be obtained. Note that
UpliftEval could still be used, however, to manually generate the curves that can be aggregated to make error bars.