Usage: evaluation
Within the up
object
up
objectAll curves can be plotted using
Where plot_type
can be any of the following values. In the formulaic representations
qini
: typical Qini curve (see Radcliffe 2007), except we normalize by the total number of people in treatment. The typical definition isaqini
: adjusted Qini curve, calculated ascuplift
: cumulative uplift curve, calculated asuplift
: typical uplift curve, calculated the same as cuplift but only returning the average value within the bin, rather than cumulatively.cgains
: cumulative gains curve (see Gutierrez, Gerardy 2016), defined asbalance
: ratio of treatment group size to total group size within each bin,
Above, corresponds to the fraction of individuals targeted -- the x-axis of these curves. and correspond to counts up to (except for the uplift curve, which is only within the bin at the position) or within the entire group, respectively. The subscript indicates the treatment group, and , the control. The subscript indicates the subset of the count for which individuals had a positive outcome.
A number of scores are stored in both the test_results_
and train_results_
objects, containing scores calculated over the test set and train set, respectively. Namely, there are three important scores:
Q
: unnormalized area between the qini curve and the random selection line.q1
:Q
, normalized by the theoretical maximum value ofQ
.q2
:Q
, normalized by the practical maximum value ofQ
.
Each of these can be accesses as attributes of test_results_
or train_results_
. Either _qini
, _aqini
, or _cgains
can be appended to obtain the same calculation for the qini curve, adjusted qini curve, or the cumulative gains curve, respectively. The score most unaffected by anomalous treatment/control ordering, without any bias to treatment or control (i.e. if you're looking at lift between two equally viable treatments) is the q1_cgains
score, but if you are looking at a simple treatment vs. control situation, q1_aqini
is preferred. Because this only really has meaning over an independent holdout [test] set, the most valuable value to access, then, would likely be up.test_results_.q1_aqini
.
Maximal curves can also be toggled by passing flags into up.plot()
.
show_theoretical_max
show_practical_max
show_no_dogs
show_random_selection
Each of these curves satisfies shows the maximally attainable curve given different assumptions about the underlying data. The show_theoretical_max
curve corresponds to a sorting in which we assume that an individual is persuadable (uplift = 1) if and only if they respond in the treatment group (and the same reasoning applies to the control group, for sleeping dogs). The show_practical_max
curve assumes that all individuals that have a positive outcome in the treatment group must also have a counterpart (relative to the proportion of individuals in the treatment and control group) in the control group that did not respond. This is a more conservative, realistic curve. The former can only be attained through overfitting, while the latter can only be attained under very generous circumstances. Within the package, we also calculate the show_no_dogs
curve, which simply precludes the possibility of negative effects.
The random selection line is shown by default, but the option to toggle it off is included in case you'd like to plot multiple plots on top of each other.
The below code plots the practical max over the aqini curve of a model contained in the TransformedOutcome object up
, then overlays the aqini curve of a second model contained in up1
, also changing the line color.
Error bars
It is often useful to obtain error bars on your qini curves. We've implemented two ways to do this:
up.shuffle_fit()
: Seeds the `train_test_split`, fit the model over the new training data, and evaluate on the new test data. Average these curves.up.noise_fit()
: Randomly shuffle the labels independently of the features and fit a model. This can help distinguish your evaluation curves from noise.
Adjustments can also be made to the aesthetics of these curves by passing in dictionaries that pass down to plot elements. For example, `shuffle_band_kwargs` is a dictionary of kwargs that modifies the `fill_between` shaded error bar region.
With UpliftEval
UpliftEval
The UpliftEval
class can also independently be used to apply the above evaluation visualizations and calculations. Note that the up
object uses UpliftEval
to generate the plots, so the UpliftEval
class object for the train set and test set can be obtained in up.train_results_
and up.test_results_
, respectively.
It generally functions the same as the up.plot()
function, except error bars cannot be obtained. Note that UpliftEval
could still be used, however, to manually generate the curves that can be aggregated to make error bars.
Last updated