Overview

The Evaluator class provides a framework for evaluating prompts using a given testset, metric, and global metric. It handles the evaluation process, including error handling, progress tracking, and result presentation.

Attributes

testset
List[DatasetItem]

The dataset to be used for evaluation.

generate
BaseGenerator

The generator used to create predictions from prompts.

metric
BaseMetric

The metric used to evaluate individual predictions.

global_metric
BaseGlobalMetric

The metric used to compute an overall score from individual results.

display_progress
bool

Whether to display a progress bar during evaluation.

display_table
Union[bool, int]

Whether to display a results table, and how many rows to show if limited.

max_errors
int

Maximum number of errors allowed before stopping evaluation.

batch_size
int

Number of concurrent tasks to run during evaluation.

return_only_score
bool

Whether to return only the final score or more detailed results.

error_count
int

Counter for the number of errors encountered during evaluation.

total_score
float

Cumulative score of all processed items in the testset.

Methods

__call__
method

Asynchronous method to evaluate a prompt using the configured testset and metrics.

Parameters:

  • prompt (Prompt): Prompt object to be evaluated.
  • testset (Optional[List[DatasetItem]]): Optional testset to override the configured one.
  • display_progress (Optional[bool]): Whether to display a progress bar.
  • display_table (Optional[Union[bool, int]]): Whether and how to display results table.
  • max_errors (Optional[int]): Maximum number of errors allowed.
  • batch_size (Optional[int]): Number of concurrent tasks.
  • return_only_score (Optional[bool]): Whether to return only the final score.
  • **kwargs: Additional keyword arguments.

Returns: Union[float, Tuple[List[Union[Dict, str]], List[MetricResult], GlobalMetricResult]]: Evaluation results, which can be a single score or more detailed results depending on configuration.

_process_testset
method

Asynchronous method to process the testset and compute individual metric results.

_bounded_process_item
method

Asynchronous method to process a single item with concurrency control.

_update_progress
method

Method to update the progress bar with current evaluation status.

_display_results_table
method

Method to display a formatted table of evaluation results, if configured.