AI Spaceship

Evaluation & Testing for AI Systems

5 modules

Build Your Own LLM

Learn the full LLM pipeline through compact lessons and executable labs.

Learn to measure, test, and improve AI systems through hands-on labs: metrics, benchmarks, unit tests for LLMs, adversarial testing, regression suites, and production monitoring.

Course Snapshot

Modules
5
Lessons
17
Completion0%

Sign in to track progress

Module 1: Foundations — Why Evaluation Matters

3 lessons

The Evaluation MindsetOpen

Lesson in Module 1: Foundations — Why Evaluation Matters

Open lesson
Accuracy, Precision, and Recall Locked

Lesson in Module 1: Foundations — Why Evaluation Matters

Locked
Building a Scoring Function Locked

Lesson in Module 1: Foundations — Why Evaluation Matters

Locked

Module 2: Text & Generation Metrics

4 lessons

Exact Match and F1 Locked

Lesson in Module 2: Text & Generation Metrics

Locked
BLEU and ROUGE Locked

Lesson in Module 2: Text & Generation Metrics

Locked
Semantic Similarity Scoring Locked

Lesson in Module 2: Text & Generation Metrics

Locked
Custom Rubric Graders Locked

Lesson in Module 2: Text & Generation Metrics

Locked

Module 3: LLM-as-Judge

3 lessons

The LLM-as-Judge Pattern Locked

Lesson in Module 3: LLM-as-Judge

Locked
Designing Judge Prompts Locked

Lesson in Module 3: LLM-as-Judge

Locked
Calibrating and Validating Judges Locked

Lesson in Module 3: LLM-as-Judge

Locked

Module 4: Test Suites & Regression Testing

4 lessons

Building an Eval Dataset Locked

Lesson in Module 4: Test Suites & Regression Testing

Locked
Assertion-Based Testing Locked

Lesson in Module 4: Test Suites & Regression Testing

Locked
Regression Detection Locked

Lesson in Module 4: Test Suites & Regression Testing

Locked
Statistical Significance Locked

Lesson in Module 4: Test Suites & Regression Testing

Locked

Module 5: Adversarial & Robustness Testing

3 lessons

Prompt Injection Detection Locked

Lesson in Module 5: Adversarial & Robustness Testing

Locked
Perturbation Testing Locked

Lesson in Module 5: Adversarial & Robustness Testing

Locked
Edge Cases and Boundary Testing Locked

Lesson in Module 5: Adversarial & Robustness Testing

Locked