Quickstart

Get started here. We will assess a payment default prediction model for gender fairness using Lens, in 5 minutes. More in-depth information can be found in the lens FAQ

Setup

Lens installation instruction can be found on readthedocs

Find the code

Click here to download this notebook.

Get your ML environment ready

In this tutorial we will emulate the modeling phase by running a quick script. This script loads a dataset, splits it into training and testing, and fits a model. You can see the full script here.

Here we have a gradient boosted classifier trained on the UCI Credit Card Default Dataset.

[1]:
# model and X_test, y_test, etc. are defined by this script
from credoai.datasets import fetch_credit_model

# model and data are defined by this script
X_test, y_test, sensitive_features_test, model = fetch_credit_model()

Imports

[2]:
# Import Lens and necessary artifacts
from credoai.lens import Lens
from credoai.artifacts import ClassificationModel, TabularData

In lens, the classes that evaluate models and/or datasets are called evaluators. In this example we are interested in evaluating the model’s fairness. For this we can use the ModelFairness evaluator. We’ll also evaluate the model’s performance.

[3]:
from credoai.evaluators import ModelFairness, Performance

Lens in 5 minutes

Below is a basic example where our goal is to evaluate the above model. We’ll break down this code below.

Briefly, the code is doing four things:

  • Wrapping ML artifacts (like models and data) in Lens objects

  • Initializing an instance of Lens. Lens is the main object that performs evaluations. Under the hood, it creates a pipeline of evaluations that are run.

  • Add evaluators to Lens.

  • Run Lens

[4]:
# set up model and data artifacts
credo_model = ClassificationModel(name="credit_default_classifier", model_like=model)
credo_data = TabularData(
    name="UCI-credit-default",
    X=X_test,
    y=y_test,
    sensitive_features=sensitive_features_test,
)

# Initialization of the Lens object
lens = Lens(model=credo_model, assessment_data=credo_data)

# initialize the evaluator and add it to Lens
metrics = ['precision_score', 'recall_score', 'equal_opportunity']
lens.add(ModelFairness(metrics=metrics))
lens.add(Performance(metrics=metrics))

# run Lens
lens.run()
2022-10-31 14:43:01,052 - lens - INFO - Evaluator ModelFairness added to pipeline. Sensitive feature: SEX
2022-10-31 14:43:01,243 - lens - INFO - fairness metric, equal_opportunity, unused by PerformanceModule
2022-10-31 14:43:01,250 - lens - INFO - Evaluator Performance added to pipeline.
2022-10-31 14:43:01,250 - lens - INFO - Running evaluation for step: PipelineStep(evaluator=<credoai.evaluators.fairness.ModelFairness object at 0x2a5a20fd0>, metadata={'evaluator': 'ModelFairness', 'sensitive_feature': 'SEX', 'dataset_type': 'assessment_data'})
2022-10-31 14:43:01,259 - lens - INFO - Running evaluation for step: PipelineStep(evaluator=<credoai.evaluators.performance.Performance object at 0x2a5a3dac0>, metadata={'evaluator': 'Performance'})
[4]:
<credoai.lens.lens.Lens at 0x2a5a3da30>

Getting results within your python environment

lens.get_results() provides a list where the results of the evaluators (a list of dataframes) are stored along with the evaluator metadata. In this case, there are 2 results - one for each evaluator.

[5]:
results = lens.get_results()
print(f"Results for {len(results)} evaluators")
Results for 2 evaluators

lens.get_results() has some arguments, which makes it easier for you to get a subset of results. These are the same arguments that can be passed to lens.get_pipeline and lens.get_evidence

[6]:
lens.get_results(evaluator_name='Performance')
[6]:
[{'metadata': {'evaluator': 'Performance'},
  'results': [              type     value
   0  precision_score  0.628081
   1     recall_score  0.360172]}]

Using Len’s pipeline argument

If we want to add multiple evaluators to our pipeline, one way of doing it could be repeating the add step, as shown above. Another way is to define the pipeline steps, and pass it to Lens at initialization time. Let’s explore the latter!

[7]:
pipeline = [
    (ModelFairness(metrics)),
    (Performance(metrics)),
]
lens = Lens(model=credo_model, assessment_data=credo_data, pipeline=pipeline)
2022-10-27 09:43:28,339 - lens - INFO - Evaluator ModelFairness added to pipeline. Sensitive feature: SEX
2022-10-27 09:43:28,555 - lens - INFO - fairness metric, equal_opportunity, unused by PerformanceModule
2022-10-27 09:43:28,580 - lens - INFO - Evaluator Performance added to pipeline.

Above, each of the tuples in the list is in the form (instantiated_evaluator, id).

[8]:
# notice that Lens functions can be chained together
results = lens.run().get_results()
print(f'\nFound results for {len(results)} evaluators')
2022-10-27 09:43:28,583 - lens - INFO - Running evaluation for step: PipelineStep(evaluator=<credoai.evaluators.fairness.ModelFairness object at 0x2ac4664c0>, metadata={'evaluator': 'ModelFairness', 'sensitive_feature': 'SEX', 'dataset_type': 'assessment_data'})
2022-10-27 09:43:28,610 - lens - INFO - Running evaluation for step: PipelineStep(evaluator=<credoai.evaluators.performance.Performance object at 0x2ac45cd30>, metadata={'evaluator': 'Performance'})

Found results for 2 evaluators

Let’s check that we have results for both of our evaluators.

[9]:
results[0]
[9]:
{'metadata': {'evaluator': 'ModelFairness',
  'sensitive_feature': 'SEX',
  'dataset_type': 'assessment_data'},
 'results': [                     type     value
  0       equal_opportunity  0.027686
  1  precision_score_parity  0.016322
  2     recall_score_parity  0.027686,
        SEX             type     value
  0  female  precision_score  0.618687
  1    male  precision_score  0.635009
  2  female     recall_score  0.344585
  3    male     recall_score  0.372271]}
[10]:
results[1]
[10]:
{'metadata': {'evaluator': 'Performance'},
 'results': [              type     value
  0  precision_score  0.628081
  1     recall_score  0.360172]}

That’s it!

That should get you up and running. Next steps include:

  • Trying out other evaluators (they are all accessible via credoai.evaluators)

  • Checking out our developer guide to better understand the Lens ecosystem and see how you can extend it.

  • Exploring the Credo AI Governance Platform, which will connect AI assessments with customizable governance to support reporting, compliance, multi-stakeholder translation and more!

Breaking Down The Steps

Preparing artifacts

Lens interacts with “AI Artifacts” which wrap model and data objects and standardize them for use by different evaluators.

Below we create a ClassificationModel artifact. This is a light wrapper for any kind of fitted classification model-like object.

We also create a TabularData artifact which stores X, y and sensitive features.

[11]:
# set up model and data artifacts
credo_model = ClassificationModel(name="credit_default_classifier", model_like=model)

credo_data = TabularData(
    name="UCI-credit-default",
    X=X_test,
    y=y_test,
    sensitive_features=sensitive_features_test,
)

Model type objects, like ClassificationModel used above, serve as adapters between arbitrary models and the evaluators in Lens. Some evaluators depend on Model instantiating certain methods. For example, ClassificationModel can accept any generic object having predict and predict_proba methods, including fitted sklearn pipelines.

Data type artifact, like TabularData serve as adapters between datasets and the evaluators in Lens.

When you pass data to a Data artifact, the artifact performs various steps of validation, and formats them so that they can be used by evaluators. The aim of this procedure is to preempt errors down the line.

You can pass Data to Lens as a training dataset or an assessment dataset (see lens class documentation). If the former, it will not be used to assess the model. Instead, dataset assessments will be performed on the dataset (e.g., fairness assessment). The validation dataset will be assessed in the same way, but also used to assess the model, if provided.

Similarly to Model type objects, Data objects can be customized, see !!insertlink!!

Evaluators

Lens uses the above artifacts to ensure a successfull run of the evaluators. As we have seen in the sections Lens in 5 minutes and Adding multiple evaluators, multiple evaluators can be added to Lens pipeline. Each evaluators contains information on what it needs in order to run successfully, and it executes a validation step at add time.

The result of the validation depends on what artifacts are available, their content and the type of evaluator being added to the pipeline. In case the validation process fails, the user is notified the reason why the evaluator cannot be added to the pipeline.

See for example:

[12]:
from credoai.evaluators import Privacy
lens.add(Privacy())
2022-10-27 09:43:28,636 - lens - INFO - Evaluator Privacy NOT added to the pipeline: Missing object training_data
[12]:
<credoai.lens.lens.Lens at 0x2ac486d30>

Currently no automatic run of evaluators is supported. However, when Lens is used in combination with Credo AI Platform, it is possible to download an assessment plan which then gets converted into a set of evaluations that Lens can run programmatically. For more information see the governance tutorial.

Run Lens

After we have initialized Lens the Model and Data (ClassificationModel and TabularData in our example) type artifacts, we can add whichever evaluators we want to the pipeline, and finally run it!

[13]:
lens = Lens(model=credo_model, assessment_data=credo_data)
metrics = ['precision_score', 'recall_score', 'equal_opportunity']
lens.add(ModelFairness(metrics=metrics))
lens.run()
2022-10-27 09:43:28,896 - lens - INFO - Evaluator ModelFairness added to pipeline. Sensitive feature: SEX
2022-10-27 09:43:28,896 - lens - INFO - Running evaluation for step: PipelineStep(evaluator=<credoai.evaluators.fairness.ModelFairness object at 0x2ac509310>, metadata={'evaluator': 'ModelFairness', 'sensitive_feature': 'SEX', 'dataset_type': 'assessment_data'})
[13]:
<credoai.lens.lens.Lens at 0x2ac4e00d0>

As you can notice, when adding evaluators to lens, they need to be instantiated. If any extra arguments need to be passed to the evaluator (like metrics in this case), this is the time to do it.

Getting Evaluator Results

Afte the pipeline is run, the results become accessible via the method get_results()

lens.get_results() provides a dictionary where the results of the evaluators are stored as values, and the keys correspond to the ids of the evaluators.

In the previous case we specified the id of the evaluator when we added ModelFairness to the pipeline, however id is an optional argument for the add method. If omitted, a random one will be generated.

[14]:
lens.get_results()
[14]:
[{'metadata': {'evaluator': 'ModelFairness',
   'sensitive_feature': 'SEX',
   'dataset_type': 'assessment_data'},
  'results': [                     type     value
   0       equal_opportunity  0.027686
   1  precision_score_parity  0.016322
   2     recall_score_parity  0.027686,
         SEX             type     value
   0  female  precision_score  0.618687
   1    male  precision_score  0.635009
   2  female     recall_score  0.344585
   3    male     recall_score  0.372271]}]

Credo AI Governance Platform

For information on how to interact with the plaform, please look into: Connecting with Governance App tutorial for directions.