Confusion Matrix
Visualize classification with Confusion Matrix
Confusion Matrix for Multi-Class Classification
A confusion matrix is a key tool for understanding the performance of classification tasks, especially when you have more than two possible output categories (multi-class). It summarizes how often each class was correctly predicted versus misclassified.
Here’s an example of a confusion matrix from an intent classification prompt with five possible intents:
- Rows: Expected (ground truth) categories
- Columns: System output (predicted categories)
- Diagonal cells: Correct predictions; off-diagonal cells indicate mistakes
- Per-class Precision, Recall, F1 Score: Show which categories need improvement
- Overall metrics: Quick view of total accuracy and F1
Example: Intent Classification & LLM Tool-Use
Suppose your LLM prompt classifies user queries into intents such as “Order Management” or “Executive Summary.” This is a classic multi-class classification problem.
Similarly, LLM “tool-use” (function calling) is evaluated in the same way: the LLM chooses one function from a set (e.g., getWeather
, lookupOrder
, summarizeReport
). Each tool or function is treated as a class, and you can use a confusion matrix to analyze accuracy and errors for each function.
Using Confusion Matrices in Asserto AI
You can easily set up and visualize a confusion matrix for your classification or tool-use prompts in Asserto AI. Here’s how:
Add a Metric in Prompt Workbench
On your prompt’s page, go to the Config tab. Under Metrics, click Add Metric.
Configure as Confusion Matrix
Set the visualization type to Confusion Matrix. Enter all possible classes/labels your prompt can output (e.g., intent types or function names).
Create a Test Case
Switch to the Test Cases tab. Add a new test case with input data.
Add an Assertion Using Your Metric
Within the test case, add an assertion. When selecting the metric, choose the confusion matrix metric you created.
Run Your Tests
Go to the Test Runner tab and run your test suite.
Review Results in the Session Page
After running tests, open the Session Page in the dashboard. Here you’ll see the confusion matrix visualization, with per-class and overall accuracy metrics.
Confusion matrices help you evaluate both classification from text and JSON outputs and tool/function-calling prompts with multiple categories.
Use these visualizations to iterate quickly and confidently on your prompt and function-calling logic!