How Scoring Works
When an evaluation pipeline finishes running, PromptLayer automatically calculates a score based on the results. There are two types of scoring methods available:Simple Scores
Simple scores are the default scoring method. They automatically aggregate results from selected columns in your pipeline. How Simple Scoring Works:- Column Selection: By default, the last column in your pipeline is used for scoring. You can select specific columns to include in the score calculation.
-
Value Aggregation: For each scoring column, the system:
- Collects all completed cell values
- Converts values to booleans (for true/false assertions) or numbers
- Calculates the mean of all values
-
Score Types:
- Boolean scores: Displayed as a percentage (0-100%) representing the ratio of true values
- Numeric scores: Displayed as the average of all numeric values
- Final Score: If multiple columns are selected for scoring, the final score is the mean of all column scores. You will see a breakdown of each column’s contribution to the overall score.
Matrix Scores
Matrix scores provide advanced scoring capabilities using custom code. This allows you to implement complex scoring logic, weighted averages, or custom business rules. How Matrix Scoring Works:- Custom Code Execution: You provide Python (or JavaScript) code that receives all evaluation data
- Data Access: Your code receives a
datavariable containing all row results - Score Calculation: Your code must return:
- A
scorekey with a numeric value (required) - Optionally, a
score_matrixfor detailed scoring breakdowns. You can provide multiple matrices if needed
- A

