Human-in-the-Loop Evaluation in Speech AI