EvaluatorΒΆ
This document will introduce how to perform model evaluation. Users can use EvalEngine
to evaluate models independently or configure the evaluator within the training engine to perform evaluations during training.
def eval_flow(batch):
p = policy.forward_step(batch)
r = reward.eval_step(p)
r1 = reward2.eval_step(p)
return r, r1
evaluator = Evaluator(eval_flow)
evaluator.set_dataset(prompts)
results = evaluator.eval()
In the above example, we constructed an evaluation flow for three models. Users can customize the evaluation execution flow through the eval_flow
.
The result returned by evaluator.eval
is of type dict
, where the key is model_name
and the value is a list
containing the results of the computations for each batch.
In the above example, the result returned by eval
will be {"reward": [batch0, batch1, batch2], "reward2": [batch0, batch1, batch2]}
.