Evaluator
A class for evaluating prompts using a given testset, metric, and global metric.
Overview
The Evaluator
class provides a framework for evaluating prompts using a given testset, metric, and global metric. It handles the evaluation process, including error handling, progress tracking, and result presentation.
Attributes
The dataset to be used for evaluation.
The generator used to create predictions from prompts.
The metric used to evaluate individual predictions.
The metric used to compute an overall score from individual results.
Whether to display a progress bar during evaluation.
Whether to display a results table, and how many rows to show if limited.
Maximum number of errors allowed before stopping evaluation.
Number of concurrent tasks to run during evaluation.
Whether to return only the final score or more detailed results.
Counter for the number of errors encountered during evaluation.
Cumulative score of all processed items in the testset.
Methods
Asynchronous method to evaluate a prompt using the configured testset and metrics.
Parameters:
prompt
(Prompt): Prompt object to be evaluated.testset
(Optional[List[DatasetItem]]): Optional testset to override the configured one.display_progress
(Optional[bool]): Whether to display a progress bar.display_table
(Optional[Union[bool, int]]): Whether and how to display results table.max_errors
(Optional[int]): Maximum number of errors allowed.batch_size
(Optional[int]): Number of concurrent tasks.return_only_score
(Optional[bool]): Whether to return only the final score.**kwargs
: Additional keyword arguments.
Returns: Union[float, Tuple[List[Union[Dict, str]], List[MetricResult], GlobalMetricResult]]: Evaluation results, which can be a single score or more detailed results depending on configuration.
Asynchronous method to process the testset and compute individual metric results.
Asynchronous method to process a single item with concurrency control.
Method to update the progress bar with current evaluation status.
Method to display a formatted table of evaluation results, if configured.
Was this page helpful?